[
What some consider to be the digital library of Alexandria is in danger of losing valuable scrolls. Major media outlets are blocking the Internet Archive’s Wayback Machine from saving web pages to prevent AI giants from training models on snapshots of old articles.
Wired reported that 23 news organizations, including USA Today and the New York Times, are among the 241 sites denying Internet Archive’s web crawler access to their articles. It’s not personal—some outlets still use the Archive in their reporting—it’s about the looming threat of AI:
- Tech companies can skirt copyright laws by using the Wayback Machine as a workaround for training language models on their content (including recipes, probably).
- Mark Graham, the director of the Wayback Machine, emphasizes that the digital archive has controls to limit abuse of AI automation and prevent large-scale data extraction.
Publishers can archive their material, but a third party maintains a more incorruptible version of stories that can hold outlets accountable when it’s revised after publication.
Nothing new: Last year, Reddit barred the Wayback Machine from data scraping for similar AI concerns. The archive also lost a slew of information when federal government websites were deleted.
Still working: Graham is reportedly in talks to regain access to the material, while more than 100 media workers signed a letter supporting Wayback.—DL
This report was originally published by Morning Brew.
https://fortune.com/img-assets/wp-content/uploads/2026/04/GettyImages-2231053897-e1776260592184.jpg?resize=1200,600
https://fortune.com/2026/04/15/why-is-internet-archive-wayback-machine-not-working-news-outlets-block-ai/
Dave Lozo, Morning Brew




