Skip to main content
Skip to content
Dispatch
Investigation

We Took Down 1,793 Documents Today. Here's the System We Built to Make Sure No One Else Has To.

When the DOJ released the EFTA files, they left survivor names, faces, addresses, and SSNs in the public record. A small group of researchers has been quietly cleaning up the mess. This is what we built so the next site doesn't have to start from zero.

Epstein ExposedMay 17, 202613 min read2,549 words
survivor-protectionredactionsdojtransparency-actblocklistinfrastructureopen-source

On May 14, 2026 at 8:04 PM, an email arrived from a researcher named Rye Howard-Stone. He pointed to a specific document on Epstein Exposed. The document contained unredacted nude photographs of a young survivor, originally released by the Department of Justice as part of the Epstein Files Transparency Act. The survivor herself had posted about the exposure on X.

Three days later we had withdrawn 1,793 documents, purged the cached copies from our CDN, rewired the entire surface area of the site to refuse to serve them, and built a system designed to make sure this never has to be done by hand again.

This is the post-mortem.

What changed on May 17, 2026
  • 1,793 documents are now blocked across every surface of Epstein Exposed (the PDF viewer, the document detail pages, the search index, the listing API, the sitemap, the OG image route, and the underlying Cloudflare R2 cache).
  • The block list came from Rye Howard-Stone, Tommy Carstensen, Tristan Lee, Rollcall, and an additional alias mapping from a fifth source. We honored the union of all of them.
  • We then ran a sibling sweep that picked up 719 additional documents that referenced the same underlying PDFs under different document ID formats. Without it, the unredacted versions would still be reachable through alternate URLs.
  • Every block carries per-source provenance in an append-only audit log. We can show exactly who flagged each document, when, and why.
  • This is Phase 0 of 6. The remaining phases build text-layer PII scrubbing, automated DOJ canonical-version replacement, image-level redaction by us, corpus-wide perceptual scanning, a signed public federation manifest, and continuous regression protection.

The Failure That Started This

The Department of Justice's February 2026 release of approximately 3.5 million pages under the Epstein Files Transparency Act was, by the assessment of attorneys representing over 200 victims, "the single most egregious violation of victim privacy in one day in United States history."

The redaction failures were not subtle. The DOJ used black rectangles overlaid on PDFs without removing the underlying text. Victim names could be revealed by selecting the redacted region and pasting into a text editor. A Wall Street Journal review found that 43 of 47 victim names had been left exposed, with some appearing over 100 times across the release. Victims received death threats. The United Nations Office of the High Commissioner for Human Rights formally condemned the disclosures. Attorney General Pam Bondi sent a six-page letter to Congress acknowledging the failures.

The DOJ has since been quietly pulling and re-redacting individual documents, but the process is incomplete and inconsistent. In the meantime, the entire ecosystem of researchers, journalists, and archives mirroring the EFTA release has been left to figure out, on a case-by-case basis, which documents are too dangerous to publish.

We documented our own response to this in the Site Administrator's Declaration on Victim Privacy Protection. Between February and March of this year we redacted 16,924 individual instances of personally identifying information across 1,567 documents, scrubbed 29 survivor names from every searchable surface of the site, and redacted 13 individuals entirely. That work was driven by a Wall Street Journal review, two community reports flagging a specific document, and our own automated scanning.

What that work did not catch were the unredacted photographs.

The Outreach

What Rye Howard-Stone has been doing for the past several months is significantly more comprehensive than what any single site has done on its own. From his Signal message:

Hey everyone, just wanted to let everyone know, for the last few days I've been trying to locate and block each and every last unredacted image of a survivor's face, including in some cases, their nude bodies. I've received additional blocklists from Tommy Carstensen of tommycarstensen.com/epstein and Tristan Lee of epstein.photos, and shared my blocklist with them. My blocklist comes from a combination of my own work, but primarily a researcher named Katie, and Rollcall (they serve a placeholder PDF for theirs so it's easy to reverse-engineer their blocklist). I compiled these three sources, then had Claude use the find-image feature to locate anything that "looked like" any of those, and place all the new ones in a local viewer, which allowed me to find 12 additional ones.

He had also contacted Mark Graham at the Internet Archive, who confirmed their filters would go live this weekend. He pinged journalists at WIRED, Axios, Washington Times, and X to try to reach JMail. He went through Rollcall's parent company's Chief Legal Officer via LinkedIn. And he wrote to us.

His final compiled list, after manually reviewing roughly 1,400 documents against the DOJ's current redacted versions, contained 807 documents that still required blocking (he reintegrated the ones the DOJ had since fixed). He also produced an image-level list with 877 entries identifying specific images on specific pages by their PDF object reference number, so individual photographs inside otherwise-acceptable documents could be redacted without taking down the whole document.

He sent it all to us with one ask. "If you have a line to JMail let me know. Been trying to contact them about the same issue."

What We Did Today

We did three things that go past honoring Rye's list.

1. Union plus sibling sweep

We unioned all five of his source files (Blocklist-total, Blocklist-new, the Rollcall quarantine list, the page-and-xref censored-images list, and a separate image-alias CSV from a fifth site) into a deduped master of 1,074 unique EFTA document IDs. Then we wrote a second query that scanned the pdfUrl and sourceUrl columns of our entire 2.1-million-row documents table for any record whose PDF URL contained one of those 1,074 EFTA IDs but whose own document ID was different.

That second query returned 719 additional rows. They were documents stored under DocumentCloud IDs (d-65481), legacy short-form EFTA IDs (efta-01600321 instead of efta-efta01600321), and DataSet-prefixed IDs (sd-10-EFTA01308147). All of them pointed to the same underlying PDF as something on the canonical block list. Without the sibling sweep, browsing to those alternate URLs would have shown the same unredacted content.

Final count of blocked document IDs in our database: 1,793.

2. Defense in depth across every entry surface

We built a shared blocklist lookup library (lib/blocklist.ts) that loads the block list from Neon Postgres on first access and caches it in memory for 60 seconds. Every surface of the site now calls into it before serving anything related to a document:

Where the block is enforced
  • PDF proxy (/api/pdf-proxy): returns HTTP 451 with a generic body that does not contain the document ID, before any fetch against archive.org, justice.gov, or the R2 cache.
  • Document detail page (/documents/[id]): renders a "Withheld for Survivor Protection" placeholder with no document ID, no title, no OG image, and no embedded PDF. The page itself returns noindex, nofollow, noarchive, noimageindex headers.
  • OG image route (/documents/[id]/opengraph-image): generates a generic "Document Withheld" PNG card instead of the document-specific preview, so social media shares and search engine snippets cannot become a roadmap back to mirror copies.
  • Listing API (/api/documents): filters blocked IDs out of the response before it reaches the client. No JSON, no metadata, no preview snippets.
  • Sitemap: blocked IDs are excluded from the quality-document sitemap so they will be progressively de-indexed from Google, Bing, and every other engine that respects sitemaps.
  • Cloudflare R2 cache: all 1,074 cached PDF copies of the canonical-form blocked documents were deleted from the epstein-docs bucket. We do not retain copies of unredacted survivor material on our infrastructure.

3. Provenance and audit trail

Every block carries an array of which source flagged it. The trigger case (the document Rye specifically pointed us to) has five independent sources agreeing: Rye's combined list, his image-only list, the Rollcall quarantine list, the page-and-xref censored-images list, and a separate restricted-images alias CSV.

Every action on the block list (initial ingest, sibling sweep, R2 purge, future automated detections) is recorded in an append-only block_audit_log table. We will publish quarterly transparency reports against this log.


Why "Just Take Down the PDFs" Is Not Enough

A PDF is one surface. A 2-million-row corpus has many.

For the documents we just blocked, our database also contains:

  • OCR text extracted from those PDFs (in our documents.ocrText column and elsewhere)
  • Document summaries that may quote survivor-context paragraphs
  • "Recovered text" rows: text we extracted from regions where the DOJ's flawed redactions allowed underlying content to leak through OCR
  • Email bodies for any email that referenced those documents as attachments
  • AI-generated dossiers that may have cited those documents
  • Search indices that allow keyword discovery of any of the above
  • Vector embeddings used by the chat and semantic-search features

If we only take down the PDF, every other column remains searchable. Anyone hitting /chat or the search endpoint can still surface a survivor's name. Phase 1, starting tonight, scrubs those derivative columns and rebuilds the search indices for every affected row.

The full multi-phase plan is published internally at our infrastructure repo and will be summarized in a follow-up post.


What Comes Next

Phase 0 was the first 72 hours. The next six weeks build out the durable system.

Phase 1 (starting tonight): text-layer scrub

We are scrubbing OCR text, recovered text, page text, document summaries, and email bodies for every blocked document. The scrub uses a hashed survivor-name dictionary (so the plaintext names never sit in our database), targeted regex passes for SSNs, dates of birth, phone numbers, and addresses, and a public-figure exception list so names that survivors have chosen to make public (Virginia Giuffre, Maria Farmer, Annie Farmer, Sarah Ransome, and others who have spoken openly) are not scrubbed against their stated wishes.

Phase 2: DOJ canonical replacement

A weekly automated poller will check the DOJ's current published version of each blocked document. When the DOJ releases a properly redacted version, our proxy will serve that version instead of returning 451. The DOJ's drift will be logged in an append-only table so we have evidence of what they fixed and when. This is something the DOJ should be doing for itself. They are not.

Phase 3: our own redacted PDFs

For documents the DOJ has not yet re-redacted, we will generate our own redacted versions by surgically blacking out the specific page-and-xref image references on Rye's list. Researchers will see the document body with the survivor images redacted, instead of a generic withdrawal page.

Phase 4: corpus-wide perceptual scanning

This is where we go past what any single human review can do. We will compute CLIP perceptual embeddings for every image in the corpus (approximately 2.1 million images) and use FAISS approximate-nearest-neighbor search against the known-blocked images to find every visually similar match. We will also run NudeNet locally (on our hardware, never sending images to a third party) across every figure to flag previously-undetected exposures for human review. Anything flagged enters our existing AI-leads review pipeline so a human approves every block.

Phase 5: the federation manifest

This is the part Rye does not currently have, and it is the part that solves his JMail problem.

We will publish a signed, versioned, public blocklist manifest at https://epsteinexposed.com/.well-known/survivor-pii-blocklist.json. It will contain the current set of blocked document IDs, severity classification, and per-source attribution. It will be signed with our Ed25519 private key so consumers can verify the manifest has not been tampered with.

Any other archive can subscribe by either pulling the manifest periodically, or by registering a webhook with us so we push diffs as the list grows. We will publish small client libraries in Python and Node (eighty lines each, no external dependencies) that handle signature verification and expose a single is_blocked(doc_id) function.

This solves the structural problem Rye identified. Right now every archivist has to send Signal messages and email attachments to every other archivist whenever the block list changes. With the federation manifest, every site subscribes once and stays current automatically.

We are inviting Rye to co-sign the v1.0 release.

Phase 6: regression protection

Every Vercel deploy will run a probe against a fixed set of known-blocked document IDs. If any of them return anything other than the withdrawn placeholder, the deploy rolls back automatically. New documents entering the ingest pipeline run through the Phase 4 detectors before they touch the public-facing database. And the redaction-request form will have a 24-hour service level for survivor-initiated takedowns.


What We Are Asking of Other Archives

If you operate a site that mirrors the Epstein Files release, here is what we are asking:

  1. Take the immediate action. Block the documents on Rye's list now. He has done the manual review. Email him through his Substack or reach out to us and we will route. Do not wait for the federation manifest. The unredacted survivor content is reachable today.

  2. Subscribe to the federation manifest when Phase 5 ships in approximately three weeks. The client library is small. The integration cost is trivial. The legal benefit is substantial. Publishing survivor PII is a tort. Publishing material depicting minors in a sexualized context is a federal crime. Subscribing to a curated blocklist is reasonable diligence.

  3. Tell us what you would need it to do. If your archive has different schema, different ID conventions, or different operational constraints, tell us before we lock the v1.0 specification.

We are specifically trying to reach: JMail (Luke Igel and Riley Walz, Matt Binder, Boone Ashworth at WIRED, Herb Scribner at Axios, Mary McCue Bell at Washington Times). Rollcall (Todd Aman, Chief Legal and Administrative Officer at FiscalNote). Wayback Machine (Mark Graham). Epsteinify ([email protected]). ExposingEpstein. EpsteinScan. EpsteinSecrets. Epsteinalysis (Oseme Ochei).

If you are at one of those sites or know someone who is, the contact is direct: [email protected].


Credit Where It Is Due

This work would not exist without Rye Howard-Stone. He spent months building the underlying list by hand, contacted every site he could find, gave us the trigger document that made the urgency concrete, and shared his methodology openly in his Substack. The infrastructure we built is what we could contribute. The list itself is his.

We also credit Tommy Carstensen (tommycarstensen.com/epstein), Tristan Lee (epstein.photos), and the researcher Rye refers to as Katie, whose work fed into the combined list. Rollcall's decision to use a placeholder PDF for quarantined documents made their list reverse-engineerable, which is part of why this cross-archive collaboration is possible at all.

And we credit the survivors who came forward to identify the exposure in the first place. The trigger case for this work was a survivor who flagged her own unredacted material on a public social network. She should never have had to do that.


If you have additional documents to flag

If you have found additional Epstein-related documents on any site that contain unredacted survivor PII or imagery, submit them through our redaction request form or email [email protected]. We will cross-reference against the current blocklist, contact the source archive directly if needed, and update the published manifest.

If you are a survivor whose information was exposed in the EFTA release, our redaction request form is processed within 24 hours and is not contingent on documentary proof.

Advertisement
SharePostReddit

Stay Updated

Get notified when new documents are released, persons are added, or major case developments occur.

No spam. Unsubscribe anytime. Or join the Discord for real-time updates.

Related Posts