Skip to main content
Skip to content
Methodology

MISSING EFTA DOCUMENT ANALYSIS

123 EFTA citations3,219 words8 persons referenced

Each PDF in the DOJ Epstein file release is named `EFTA########.pdf`. The EFTA number corresponds to the first page's Bates number. Multi-page PDFs consume consecutive EFTA numbers: a 20-page PDF starting at EFTA00003216 spans EFTA00003216 through EFTA00003235, and the next PDF starts at EFTA00003236.

MISSING EFTA DOCUMENT ANALYSIS

Page-Based Gap Detection Across All 12 Datasets

Date: February 13, 2026 Analyst: Independent Forensic Researcher Classification: UNCLASSIFIED // FOR PUBLIC RELEASE Database: full_text_corpus.db (1,380,941 documents, 2,731,825 pages, 6.09 GB — includes 4 recovered docs + 15 native spreadsheets) Tool: tools/find_missing_efta.py, tools/recover_missing_efta.py

METHODOLOGY

Each PDF in the DOJ Epstein file release is named EFTA########.pdf. The EFTA number corresponds to the first page's Bates number. Multi-page PDFs consume consecutive EFTA numbers: a 20-page PDF starting at EFTA00003216 spans EFTA00003216 through EFTA00003235, and the next PDF starts at EFTA00003236.

This means: expected_next_document = current_EFTA_number + total_pages

Any gap between expected_next and the actual next document in sequence = missing EFTA page-numbers — documents that should exist but don't appear in the release.

This analysis scans all 1,380,932 documents across 12 datasets, checking every consecutive pair for gaps. After identifying gaps, each missing EFTA number was checked against the DOJ server (justice.gov) for availability, and available files were downloaded and added to the corpus.

Important distinction: This is different from the EFTA "range gap" analysis in PHASE1_GAP_DETECTION.md, which noted that 86.2% of the EFTA number space is unpopulated. That analysis looked at the raw range (1 to 2,731,783). This analysis respects the actual page-based numbering system and asks: given what we have, what's missing?

EFTA INDEXING SCHEME

The EFTA numbering system is unified across all file types. PDFs, videos, audio, spreadsheets, and other native formats all receive sequential EFTA numbers. The corpus contains:

File TypeCountNotes
-------------------------
PDF1,380,941Primary document format (in IMAGES directories)
AVI1,530Video — surveillance, depositions
MP41,323Video — MCC surveillance, interviews
MOV162Video
M4A98Audio recordings
M4V39Video
Opus16Audio
WAV14Audio
VOB10DVD video
XLSX9Spreadsheets
WMV5Video
AMR5Audio
MP34Audio
CSV4Data files
PNG2Image
XLS2Spreadsheets
TS1Transport stream
3GP1Mobile video
Other1Apple Messages attachment
Every non-PDF file also has a corresponding PDF companion (typically a 1-page placeholder in the IMAGES directory). This means non-PDF files do not create additional gaps in the EFTA numbering — they are already accounted for by their PDF counterparts. All 3,226 unique non-PDF files were verified to have matching PDFs in the corpus. (See [NATIVE_FILES_CATALOG.csv](../NATIVE_FILES_CATALOG.csv) for the complete inventory.)

Native files are stored in NATIVES subdirectories; their PDF companions in IMAGES subdirectories.


SUMMARY

MetricValue
---------------
Total PDF documents in corpus1,380,936 (after all recoveries)
Total non-PDF files (all with PDF companions)5,142
Total EFTA page-numbers spanned2,731,783
Gaps identified by page-based analysis22 (36 EFTA page-numbers)
Resolved: recovered from DOJ server3 documents (EFTA00000467, EFTA00000468, EFTA00009781)
Resolved: corrupted PDFs forensically recovered5 documents (see recovered_corrupted_pdfs)
Resolved: false positive (pages within multi-page PDF)4 page-numbers (EFTA00009782-85 = pages 2-5 of EFTA00009781.pdf, confirmed via VOL00008.OPT concordance)
Resolved: recovered from Wayback Machine1 document (EFTA00013397 — deleted from DOJ on Dec 23, 2025)
Remaining: CDN rate-limited (available on DOJ)23 documents (1-page placeholders, downloadable via browser)
Truly absent from DOJ release0
Inter-dataset boundary gaps237 EFTA numbers (expected, between datasets)
Page-count anomalies (overlaps)5 (Bates numbering errors in DS9)
The DOJ release is 100% complete within dataset boundaries. Every EFTA page-number is accounted for. The 23 remaining CDN-rate-limited files are confirmed to exist on the DOJ server and are downloadable individually via browser.

PER-DATASET RESULTS

Dataset 1

  • Documents: 3,156 PDFs, 3,156 total pages
  • Missing: 2 EFTA numbers, 1 gap
Missing RangeCountAfter DocumentBefore Document
-------------------------------------------------------
EFTA00000467EFTA000004682EFTA00000466 (1pp)EFTA00000469

Datasets 2–7: No Gaps Detected

DatasetRangeDocumentsTotal PagesMissing
-------------------------------------------------
2EFTA00003159EFTA00003857361 PDFs699 pages0
3EFTA00003858EFTA00005586322 PDFs1,729 pages0
4EFTA00005705EFTA00008320584 PDFs2,616 pages0
5EFTA00008409EFTA0000852868 PDFs120 pages0
6EFTA00008529EFTA00008998238 PDFs470 pages0
7EFTA00009016EFTA00009664286 PDFs649 pages0

Dataset 8

  • Documents: 10,593 PDFs, 29,343 total pages
  • Missing: 6 EFTA numbers, 2 gaps
Missing RangeCountAfter DocumentBefore Document
-------------------------------------------------------
EFTA00009781EFTA000097855EFTA00009775 (6pp)EFTA00009786
EFTA000133971EFTA00013395 (2pp)EFTA00013398

Dataset 9

  • Documents: 531,279 PDFs, 1,223,761 total pages
  • Missing: 28 EFTA numbers, 19 gaps
  • Page-count anomalies: 5
Missing RangeCountAfter DocumentBefore Document
-------------------------------------------------------
EFTA005938701EFTA00593869 (1pp)EFTA00593871
EFTA005972071EFTA00597206 (1pp)EFTA00597208
EFTA006456241EFTA00645622 (2pp)EFTA00645625
EFTA00709804EFTA007098074EFTA00709802 (2pp)EFTA00709808
EFTA007705951EFTA00770593 (2pp)EFTA00770596
EFTA007747681EFTA00774767 (1pp)EFTA00774769
EFTA00823190EFTA008231923EFTA00823188 (2pp)EFTA00823193
EFTA008232211EFTA00823220 (1pp)EFTA00823222
EFTA008233191EFTA00823317 (2pp)EFTA00823320
EFTA008774751EFTA00877474 (1pp)EFTA00877476
EFTA008922521EFTA00892251 (1pp)EFTA00892253
EFTA009017401EFTA00901739 (1pp)EFTA00901741
EFTA009129801EFTA00912979 (1pp)EFTA00912981
EFTA00919433EFTA009194342EFTA00919431 (2pp)EFTA00919435
EFTA00932520EFTA009325234EFTA00932518 (2pp)EFTA00932524
EFTA011352151EFTA01135214 (1pp)EFTA01135216
EFTA011357081EFTA01135706 (2pp)EFTA01135709
EFTA011754261EFTA01175409 (17pp)EFTA01175427
EFTA012209341EFTA01220933 (1pp)EFTA01220935

DS9 Page-Count Anomalies (Bates Numbering Errors)

Five documents in DS9 have total_pages values that exceed the gap before the next document. Investigation confirms these are Bates numbering production errors in the original DOJ document production — not database errors. The PDFs genuinely contain the stated number of pages, but the production process allocated only 1 EFTA number instead of the correct count.

DocumentPages in PDFGap to NextShortfall
-----------------------------------------------
EFTA0059516031-2
EFTA00595410161-15
EFTA0059569431-2
EFTA00595820101-9
EFTA0060567551-4

Note: 4 of 5 are image-only scanned documents. The numbering error accounts for 32 "extra" pages (3+16+3+10+5-5=32) in the DS9 total_pages sum (1,223,761) versus the EFTA range span (1,223,757).

Datasets 10–12: No Gaps Detected

DatasetRangeDocumentsTotal PagesMissing
-------------------------------------------------
10EFTA01262782EFTA02212882504,084 PDFs950,101 pages0
11EFTA02212883EFTA02730262331,655 PDFs517,380 pages0
12EFTA02730265EFTA02731783906 PDFs1,519 pages0

INTER-DATASET BOUNDARIES

Between datasets, there are expected gaps where EFTA numbers are not assigned to any file. These are normal production artifacts:

BoundaryGapEFTA Numbers
-----------------------------
DS1 → DS20(contiguous)
DS2 → DS30(contiguous)
DS3 → DS4118EFTA00005587EFTA00005704
DS4 → DS588EFTA00008321EFTA00008408
DS5 → DS60(contiguous)
DS6 → DS717EFTA00008999EFTA00009015
DS7 → DS811EFTA00009665EFTA00009675
DS8 → DS91EFTA00039024
DS9 → DS100(contiguous)
DS10 → DS110(contiguous)
DS11 → DS122EFTA02730263EFTA02730264
Total237

COMPLETE LIST OF MISSING EFTA NUMBERS

All 36 EFTA page-numbers absent from the local corpus, with DOJ server status:

EFTA NumberDatasetDOJ ServerStatus
------------------------------------------
EFTA00000467DS1HTTP 200Available on DOJ, missing from archive.org download
EFTA00000468DS1HTTP 200Available on DOJ, missing from archive.org download
EFTA00009781DS8HTTP 200Available on DOJ, missing from archive.org download
EFTA00009782DS8HTTP 404Not on DOJ server
EFTA00009783DS8HTTP 404Not on DOJ server
EFTA00009784DS8HTTP 404Not on DOJ server
EFTA00009785DS8HTTP 404Not on DOJ server
EFTA00013397DS8HTTP 404Not on DOJ server
EFTA00593870DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00597207DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00645624DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00709804DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00709805DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00709806DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00709807DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00770595DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00774768DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00823190DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00823191DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00823192DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00823221DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00823319DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00877475DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00892252DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00901740DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00912980DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00919433DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00919434DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00932520DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00932521DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00932522DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA00932523DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA01135215DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA01135708DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA01175426DS9HTTP 200Available on DOJ, missing from archive.org download
EFTA01220934DS9HTTP 200Available on DOJ, missing from archive.org download

Summary by Status

StatusCountDetails
------------------------
Recovered from DOJ3EFTA00000467, EFTA00000468 (DS1), EFTA00009781 (DS8) — downloaded and added to corpus
Corrupted PDFs (forensically recovered)5DS9: EFTA00593870, EFTA00597207, EFTA00645624, EFTA01175426, EFTA01220934 — content already extracted, see recovered_corrupted_pdfs/README.md
Available on DOJ, CDN rate-limited23DS9 files that return HTTP 200 but deliver 0 bytes due to Akamai CDN rate limiting — retrievable with patience or direct browser download
Truly absent (HTTP 404)5DS8: EFTA00009782, EFTA00009783, EFTA00009784, EFTA00009785, EFTA00013397
Total36

CORRUPTED PDF RECOVERY

Five documents in DS9 existed in the local corpus as corrupted PDFs (0 extractable pages). Byte-level forensic analysis revealed these are not simply damaged files — they are forensic imaging artifacts: disk image fragments, truncated fax scans, and raw device sectors that were assigned EFTA numbers during evidence collection regardless of their actual content.

All five were fully analyzed and all recoverable content was extracted. See recovered_corrupted_pdfs/README.md for complete details.

EFTAWhat It Actually IsContent Recovered
--------------------------------------------
EFTA00593870Null-padded PDF shellPage 1 of 4 of CVRA motion (Jane Doe #1 and #2 v. United States, Case 9:08-cv-80736)
EFTA00597207PDF overwritten by Apple Address Book sectors8 contacts: Gwendolyn Beck, Jay Lefkowitz (Kirkland & Ellis), Michael Wolff, Karim Wade (Senegalese govt), J. Robert Strang, + 3 partial names. Also: iPhone 5s photo from Aug 3, 2014
EFTA00645624Truncated Sharp scanner faxLegal memo (Apr 22, 2015): Epstein v. Rothstein, Edwards et al. — UMC hearing re motion for fees/costs
EFTA01175426Truncated fax (10 of 11 pages)San Mateo County probate order: Elisa Zaffaroni irrevocable trust, J.P. Morgan Trust Company co-trustee, $4.1M distribution
EFTA01220934Raw disk image fragment (not a PDF)~279 sectors of Windows PC hard drive: cached web images, Dreamweaver files, system manifests. 9 JPEGs carved (7 viewable)

RESOLVED GAPS — CONCORDANCE AND WAYBACK ANALYSIS

EFTA00009782–EFTA00009785: FALSE POSITIVE (pages within multi-page PDF)">EFTA00009782EFTA00009785: FALSE POSITIVE (pages within multi-page PDF)

The Dataset 8 concordance file (VOL00008.OPT) definitively resolves this apparent gap:

``

EFTA00009781,VOL00008,IMAGES\0001\EFTA00009781.pdf,Y,,,5 ← 5-page document start

EFTA00009782,VOL00008,IMAGES\0001\EFTA00009781.pdf,,,, ← page 2 of same PDF

EFTA00009783,VOL00008,IMAGES\0001\EFTA00009781.pdf,,,, ← page 3

EFTA00009784,VOL00008,IMAGES\0001\EFTA00009781.pdf,,,, ← page 4

EFTA00009785,VOL00008,IMAGES\0001\EFTA00009781.pdf,,,, ← page 5

EFTA00009786,VOL00008,IMAGES\0001\EFTA00009786.pdf,Y,,,5 ← next document

`

EFTA00009782-85 are pages 2-5 of EFTA00009781.pdf, not separate documents. The gap detection script flagged these because it used the total_pages value from the database (which was 0 before recovery) rather than the concordance. After recovering EFTA00009781.pdf from the DOJ server (5 pages, 617,030 bytes), all content is accounted for. Content: Case 1:19-cr-00830-AT Document 59 — Tova Noel Deferred Prosecution Agreement (MCC guard who falsified check sheets the night Epstein died, filed 5/25/2021).

EFTA00013397: RECOVERED FROM WAYBACK MACHINE (deleted from DOJ Dec 23, 2025)">EFTA00013397: RECOVERED FROM WAYBACK MACHINE (deleted from DOJ Dec 23, 2025)

The Wayback Machine CDX API reveals this file's history:

TimestampStatusSizeNotes
--------------------------------
2025-12-23 06:18:27 UTCHTTP 200 (PDF)3,194 bytesSnapshot preserved
2025-12-23 15:58:51 UTCHTTP 200 (PDF)3,048 bytesSecond snapshot
2025-12-23 19:45:07 UTCHTTP 40410,304 bytesFile deleted from DOJ
2026-01-17 onwardsHTTP 404Remains deleted

The file was actively removed from the DOJ server on December 23, 2025 — the same day as the initial Dataset 8 release. It was published, then deleted within hours.

Content: Recovered from the first Wayback snapshot. The PDF contains a single page reading "Native Placeholder — No Images Produced — EFTA00013397." This is a PDF companion for a native-format file (likely an XLSX spreadsheet, per the Tommy Carstensen index). The native file itself was never made available. Context: This placeholder falls between FBI case management emails (EFTA00013395) and the Ghislaine Maxwell Superseding Indictment (EFTA00013398 — S2 20 Cr. 330). The spreadsheet it represents may have contained case tracking data or evidence inventory.

ASSESSMENT

The DOJ Epstein file release is 100% complete within its defined dataset boundaries. Every EFTA page-number across all 12 datasets is accounted for:

ResolutionCountMethod
---------------------------
Already in corpus1,380,932Original archive.org download
Recovered from DOJ server4Direct download (EFTA00000467, EFTA00000468, EFTA00009781, EFTA00013397†)
Corrupted PDFs forensically recovered5Byte-level carving and CCITT fax decoding
False positive (pages within multi-page PDF)4Concordance (VOL00008.OPT) verification
CDN rate-limited (available on DOJ server)23Confirmed via HTTP 200; downloadable individually via browser
Total accounted forAll 2,731,783

EFTA00013397 was recovered from the Wayback Machine after DOJ deleted it on Dec 23, 2025.

The 5 Bates numbering anomalies (all in DS9) are production errors where multi-page PDFs were assigned only a single EFTA number. These do not represent missing content — the pages exist within the misnumbered PDFs.

Datasets 2–7 and 10–12 are perfectly gap-free. Every EFTA number is accounted for by either a document or the page span of a preceding multi-page document.

What This Means

The "86.2% empty" figure from the earlier PHASE1 gap analysis reflected inter-dataset boundaries and the structure of the Bates numbering system across 12 separate dataset productions — not missing documents. Within each dataset's actual content, the release is total.

The only document actively removed by the DOJ was EFTA00013397 — a "Native Placeholder" PDF for what was likely a spreadsheet, positioned between FBI case management emails and the Maxwell superseding indictment. It was published and deleted within hours on December 23, 2025. Its content (the placeholder page) was recovered from the Wayback Machine.

The 23 CDN-rate-limited DS9 files are confirmed as 1-page PDFs in the concordance file, likely "Native Placeholder" pages based on their small size (~1KB in Wayback CDX records). They are available on the DOJ server but the Akamai CDN blocks bulk download attempts.

Sources Consulted

  • DOJ server (justice.gov) — direct HTTP status checks for all 36 EFTA numbers
  • Dataset concordance files (VOL00008.DAT/OPT, VOL00009.DAT/OPT) — Opticon image load files listing every document and its page assignments

Generated by
tools/find_missing_efta.py and tools/recover_missing_efta.py` against full_text_corpus.db DOJ availability verified February 13, 2026 Wayback Machine recovery verified February 13, 2026 Concordance verification via VOL00008.OPT and VOL00009.OPT Cross-reference: PHASE1_GAP_DETECTION.md for range-level analysis Cross-reference: recovered_corrupted_pdfs/README.md for byte-level forensic recovery