Part I
The Scale of the Task
3M
Pages Released
180K
Images
2,000
Videos
2–3%
Reviewed So Far
The Justice Department released the Epstein files on January 30, 2026 — a document dump so vast that, stacked, the pages would reach the top of the Empire State Building. About two dozen journalists are working through the material, yet have seen only 2–3% of it. At that rate, it would take years to review and verify everything.
Reporters from the Investigations, National, Metro, and Business desks were assembled alongside engineers and AI journalists, creating an unusually cross-functional team. They started with a simple method: search terms. Trump. Clinton. Gates. Duke of York. Names, places, and events connected to Epstein.
It was like we suddenly had subpoena power. Witness statements, emails, bank records — all of it.
— Kirsten Danis, Investigations Editor
Part II
AI Meets Traditional Reporting: The Hybrid Approach
The NYT's approach was never to let AI drive the reporting — rather, to use it as an amplifier for human judgment. Here's how each layer worked:
-
01
Document Ingestion & Indexing
Andrew Chavez's Interactive News team spent ~10 hours uploading and indexing all 3 million pages into a proprietary search tool. Until ready, reporters used the DOJ's own clunky interface.
-
02
Semantic Search (Beyond Keywords)
Traditional search only finds exact matches. Semantic search finds conceptually related content — critical for navigating typos, OCR errors, and inconsistent language throughout the files.
-
03
AI Tagging & Categorization
An AI tool automatically bucketed documents by type (email, legal record, photo, text message) and added newsworthiness labels, giving reporters a fast triage layer.
-
04
Visual Photo Search
Reporters could search images by visual content — not just metadata — allowing them to locate photos and scan imagery at a scale impossible by hand.
-
05
Duplicate Detection
An AI tool flagged repeated documents across the 3 million pages, preventing reporters from counting the same information multiple times or chasing false leads.
-
06
Crowdsourced Spreadsheet Verification
Dylan Freedman built a scraper that pulled DOJ search results into organized spreadsheets by key figure. Reporters then collaboratively verified each entry — classic newsroom teamwork, supercharged.
-
07
Video & Audio Transcription
AI parsed the 2,000+ videos and audio files into searchable transcripts, making multimedia part of the investigative database — not a separate, harder-to-access silo.
AI is like a liquid — information can be molded into different formats and searched in rich, expressive ways. But it can never replace expert news judgment.
— Dylan Freedman, AI Projects Editor
Part III
What AI Can and Cannot Do
The Times team was remarkably candid about where AI helped and where it fell short — a more honest accounting than most newsrooms provide.
| AI Was Good At |
AI Was Bad At |
| Extracting text from images & audio |
News judgment |
| Captioning photos automatically |
Determining importance or newsworthiness |
| Assigning structure to raw emails |
Generating original ideas |
| Processing messy, unstructured data |
Avoiding sycophancy & confirmation bias |
| Building tools quickly (days vs. weeks) |
Reliable redaction analysis |
Chavez stressed that they gave AI only "discrete, narrow tasks" it could handle reliably — like identifying whether a page contained an image — rather than open-ended analysis. The AI surfaces signals; reporters follow up with human judgment and sourcing built over years.
Part IV
AI Verification in Action: Two Case Studies
Case Study 01
The "=9yo" OCR Error
A viral social media claim centered on a document showing "=9yo" — implying a 9-year-old. NYT's tools cross-referenced multiple versions of the same document and confirmed it was a software ingestion error. Another version of the document clearly read "19yo." AI-assisted cross-referencing caught what human eyes might have missed at scale — and what disinformation spreaders exploited.
Case Study 02
The Fake "Unredacted AI" Videos
Viral videos claimed to show AI "undoing" government redactions in the Epstein files. The Times built a tool that scanned all 3 million pages for potentially reversible redactions — and found none. What the videos actually showed was AI hallucinating plausible text beneath black boxes, not revealing real hidden information. The tool provided definitive, documented proof to counter the disinformation.
Editorial Standards
A third judgment call — around unverified Trump accusations — showed how editorial process and AI tools work together. The team found a document summarizing over a dozen unverified tips about Trump and Epstein, but chose to describe their existence in general terms without publishing unverifiable details. The article's published language: "The emails did not include any corroborating evidence and The New York Times is not describing the details of the unverified claims."
Part V
What the Files Have — and Haven't — Revealed
After reviewing roughly 2–3% of the material, the picture is clearer in some areas and murky in others:
-
✓
Rich portrait of Epstein's orbit
The files give an inside look at how Epstein operated — trading gifts and favors, enticing elites with dinners and island visits, and positioning himself as someone who "knew things" about powerful people.
-
✓
Extensive network documentation
38,000+ references to Trump — though many are news clippings and previously released congressional documents. No major new revelations about his relationship with Epstein emerged.
-
✗
No proof of wide pedophilia ring
Despite thorough searching, no hard evidence of a broad ring has emerged in what's been reviewed. Possible co-conspirators named by investigators were not new names.
-
✗
No clear blackmail proof
The theory that Epstein collected secrets for leverage remains unproven. The files show he "seemed to see value" in claiming to know things — but hard evidence has not surfaced yet.
It is hard to believe that after all that has been said, there is still so much to learn about Epstein and his network.
— Steve Eder, Investigative Reporter