Methodology

MISSING EFTA DOCUMENT ANALYSIS

123 EFTA citations3,219 words8 persons referenced

Each PDF in the DOJ Epstein file release is named `EFTA########.pdf`. The EFTA number corresponds to the first page's Bates number. Multi-page PDFs consume consecutive EFTA numbers: a 20-page PDF starting at EFTA00003216 spans EFTA00003216 through EFTA00003235, and the next PDF starts at EFTA00003236.

MISSING EFTA DOCUMENT ANALYSIS

Page-Based Gap Detection Across All 12 Datasets

Date: February 13, 2026 Analyst: Independent Forensic Researcher Classification: UNCLASSIFIED // FOR PUBLIC RELEASE Database: full_text_corpus.db (1,380,941 documents, 2,731,825 pages, 6.09 GB — includes 4 recovered docs + 15 native spreadsheets) Tool: tools/find_missing_efta.py, tools/recover_missing_efta.py

METHODOLOGY

Each PDF in the DOJ Epstein file release is named EFTA########.pdf. The EFTA number corresponds to the first page's Bates number. Multi-page PDFs consume consecutive EFTA numbers: a 20-page PDF starting at EFTA00003216 spans EFTA00003216 through EFTA00003235, and the next PDF starts at EFTA00003236.

This means: expected_next_document = current_EFTA_number + total_pages

Any gap between expected_next and the actual next document in sequence = missing EFTA page-numbers — documents that should exist but don't appear in the release.

This analysis scans all 1,380,932 documents across 12 datasets, checking every consecutive pair for gaps. After identifying gaps, each missing EFTA number was checked against the DOJ server (justice.gov) for availability, and available files were downloaded and added to the corpus.

Important distinction: This is different from the EFTA "range gap" analysis in PHASE1_GAP_DETECTION.md, which noted that 86.2% of the EFTA number space is unpopulated. That analysis looked at the raw range (1 to 2,731,783). This analysis respects the actual page-based numbering system and asks: given what we have, what's missing?

EFTA INDEXING SCHEME

The EFTA numbering system is unified across all file types. PDFs, videos, audio, spreadsheets, and other native formats all receive sequential EFTA numbers. The corpus contains:

File Type

Count

Notes

-----------

-------

PDF

1,380,941

Primary document format (in IMAGES directories)

AVI

1,530

Video — surveillance, depositions

MP4

1,323

Video — MCC surveillance, interviews

MOV

162

Video

M4A

Audio recordings

M4V

Video

Opus

Audio

WAV

Audio

VOB

DVD video

XLSX

Spreadsheets

WMV

Video

AMR

Audio

MP3

Audio

CSV

Data files

PNG

Image

XLS

Spreadsheets

Transport stream

3GP

Mobile video

Other

Apple Messages attachment

Every non-PDF file also has a corresponding PDF companion (typically a 1-page placeholder in the IMAGES directory). This means non-PDF files do not create additional gaps in the EFTA numbering — they are already accounted for by their PDF counterparts. All 3,226 unique non-PDF files were verified to have matching PDFs in the corpus. (See [NATIVE_FILES_CATALOG.csv](../NATIVE_FILES_CATALOG.csv) for the complete inventory.)

Native files are stored in NATIVES subdirectories; their PDF companions in IMAGES subdirectories.

SUMMARY

Metric

Value

--------

-------

Total PDF documents in corpus

1,380,936 (after all recoveries)

Total non-PDF files (all with PDF companions)

5,142

Total EFTA page-numbers spanned

2,731,783

Gaps identified by page-based analysis

22 (36 EFTA page-numbers)

Resolved: recovered from DOJ server

3 documents (EFTA00000467, EFTA00000468, EFTA00009781)

Resolved: corrupted PDFs forensically recovered

5 documents (see recovered_corrupted_pdfs)

Resolved: false positive (pages within multi-page PDF)

4 page-numbers (EFTA00009782-85 = pages 2-5 of EFTA00009781.pdf, confirmed via VOL00008.OPT concordance)

Resolved: recovered from Wayback Machine

1 document (EFTA00013397 — deleted from DOJ on Dec 23, 2025)

Remaining: CDN rate-limited (available on DOJ)

23 documents (1-page placeholders, downloadable via browser)

Truly absent from DOJ release

Inter-dataset boundary gaps

237 EFTA numbers (expected, between datasets)

Page-count anomalies (overlaps)

5 (Bates numbering errors in DS9)

The DOJ release is 100% complete within dataset boundaries. Every EFTA page-number is accounted for. The 23 remaining CDN-rate-limited files are confirmed to exist on the DOJ server and are downloadable individually via browser.

PER-DATASET RESULTS

Dataset 1

Range: EFTA00000001 – EFTA00003158 (3,158 EFTA numbers)

Documents: 3,156 PDFs, 3,156 total pages

Missing: 2 EFTA numbers, 1 gap

Missing Range

Count

After Document

Before Document

---------------

-------

----------------

-----------------

EFTA00000467–EFTA00000468

EFTA00000466 (1pp)

EFTA00000469

Datasets 2–7: No Gaps Detected

Dataset

Range

Documents

Total Pages

Missing

---------

-------

-----------

-------------

---------

EFTA00003159–EFTA00003857

361 PDFs

699 pages

EFTA00003858–EFTA00005586

322 PDFs

1,729 pages

EFTA00005705–EFTA00008320

584 PDFs

2,616 pages

EFTA00008409–EFTA00008528

68 PDFs

120 pages

EFTA00008529–EFTA00008998

238 PDFs

470 pages

EFTA00009016–EFTA00009664

286 PDFs

649 pages

Dataset 8

Range: EFTA00009676 – EFTA00039023 (29,348 EFTA numbers)

Documents: 10,593 PDFs, 29,343 total pages

Missing: 6 EFTA numbers, 2 gaps

Missing Range

Count

After Document

Before Document

---------------

-------

----------------

-----------------

EFTA00009781–EFTA00009785

EFTA00009775 (6pp)

EFTA00009786

EFTA00013397

EFTA00013395 (2pp)

EFTA00013398

Dataset 9

Range: EFTA00039025 – EFTA01262781 (1,223,757 EFTA numbers)

Documents: 531,279 PDFs, 1,223,761 total pages

Missing: 28 EFTA numbers, 19 gaps

Page-count anomalies: 5

Missing Range

Count

After Document

Before Document

---------------

-------

----------------

-----------------

EFTA00593869 (1pp)

EFTA00597206 (1pp)

EFTA00645622 (2pp)

EFTA00709804–EFTA00709807

EFTA00709802 (2pp)

EFTA00770593 (2pp)

EFTA00774767 (1pp)

EFTA00823190–EFTA00823192

EFTA00823188 (2pp)

EFTA00823220 (1pp)

EFTA00823317 (2pp)

EFTA00877474 (1pp)

EFTA00892251 (1pp)

EFTA00901739 (1pp)

EFTA00912979 (1pp)

EFTA00919433–EFTA00919434

EFTA00919431 (2pp)

EFTA00919435

EFTA00932520–EFTA00932523

EFTA00932518 (2pp)

EFTA01135214 (1pp)

EFTA01135706 (2pp)

EFTA01175409 (17pp)

EFTA01220933 (1pp)

DS9 Page-Count Anomalies (Bates Numbering Errors)

Five documents in DS9 have total_pages values that exceed the gap before the next document. Investigation confirms these are Bates numbering production errors in the original DOJ document production — not database errors. The PDFs genuinely contain the stated number of pages, but the production process allocated only 1 EFTA number instead of the correct count.

Document

Pages in PDF

Gap to Next

Shortfall

----------

-------------

-----------

-2

-15

-2

-9

-4

Note: 4 of 5 are image-only scanned documents. The numbering error accounts for 32 "extra" pages (3+16+3+10+5-5=32) in the DS9 total_pages sum (1,223,761) versus the EFTA range span (1,223,757).

Datasets 10–12: No Gaps Detected

Dataset

Range

Documents

Total Pages

Missing

---------

-------

-----------

-------------

---------

EFTA01262782–EFTA02212882

504,084 PDFs

950,101 pages

EFTA02212883–EFTA02730262

331,655 PDFs

517,380 pages

EFTA02730265–EFTA02731783

906 PDFs

1,519 pages

INTER-DATASET BOUNDARIES

Between datasets, there are expected gaps where EFTA numbers are not assigned to any file. These are normal production artifacts:

Boundary

Gap

EFTA Numbers

----------

-----

--------------

DS1 → DS2

(contiguous)

DS2 → DS3

(contiguous)

DS3 → DS4

118

EFTA00005587–EFTA00005704

DS4 → DS5

EFTA00008321–EFTA00008408

DS5 → DS6

(contiguous)

DS6 → DS7

EFTA00008999–EFTA00009015

DS7 → DS8

EFTA00009665–EFTA00009675

DS8 → DS9

EFTA00039024

DS9 → DS10

(contiguous)

DS10 → DS11

(contiguous)

DS11 → DS12

EFTA02730263–EFTA02730264

Total

237

COMPLETE LIST OF MISSING EFTA NUMBERS

All 36 EFTA page-numbers absent from the local corpus, with DOJ server status:

EFTA Number

Dataset

DOJ Server

Status

-------------

---------

------------

--------

EFTA00000467

DS1

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00000468

DS1

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00009781

DS8

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00009782

DS8

HTTP 404

Not on DOJ server

EFTA00009783

DS8

HTTP 404

Not on DOJ server

EFTA00009784

DS8

HTTP 404

Not on DOJ server

EFTA00009785

DS8

HTTP 404

Not on DOJ server

EFTA00013397

DS8

HTTP 404

Not on DOJ server

EFTA00593870

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00597207

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00645624

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00709804

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00709805

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00709806

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00709807

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00770595

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00774768

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00823190

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00823191

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00823192

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00823221

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00823319

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00877475

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00892252

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00901740

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00912980

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00919433

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00919434

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00932520

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00932521

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00932522

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA00932523

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA01135215

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA01135708

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA01175426

DS9

HTTP 200

Available on DOJ, missing from archive.org download

EFTA01220934

DS9

HTTP 200

Available on DOJ, missing from archive.org download

Summary by Status

Status

Count

Details

--------

-------

---------

Recovered from DOJ

EFTA00000467, EFTA00000468 (DS1), EFTA00009781 (DS8) — downloaded and added to corpus

Corrupted PDFs (forensically recovered)

DS9: EFTA00593870, EFTA00597207, EFTA00645624, EFTA01175426, EFTA01220934 — content already extracted, see recovered_corrupted_pdfs/README.md

Available on DOJ, CDN rate-limited

DS9 files that return HTTP 200 but deliver 0 bytes due to Akamai CDN rate limiting — retrievable with patience or direct browser download

Truly absent (HTTP 404)

DS8: EFTA00009782, EFTA00009783, EFTA00009784, EFTA00009785, EFTA00013397

Total

CORRUPTED PDF RECOVERY

Five documents in DS9 existed in the local corpus as corrupted PDFs (0 extractable pages). Byte-level forensic analysis revealed these are not simply damaged files — they are forensic imaging artifacts: disk image fragments, truncated fax scans, and raw device sectors that were assigned EFTA numbers during evidence collection regardless of their actual content.

All five were fully analyzed and all recoverable content was extracted. See recovered_corrupted_pdfs/README.md for complete details.

EFTA

What It Actually Is

Content Recovered

------

-------------------

EFTA00593870

Null-padded PDF shell

Page 1 of 4 of CVRA motion (Jane Doe #1 and #2 v. United States, Case 9:08-cv-80736)

EFTA00597207

PDF overwritten by Apple Address Book sectors

8 contacts: Gwendolyn Beck, Jay Lefkowitz (Kirkland & Ellis), Michael Wolff, Karim Wade (Senegalese govt), J. Robert Strang, + 3 partial names. Also: iPhone 5s photo from Aug 3, 2014

EFTA00645624

Truncated Sharp scanner fax

Legal memo (Apr 22, 2015): Epstein v. Rothstein, Edwards et al. — UMC hearing re motion for fees/costs

EFTA01175426

Truncated fax (10 of 11 pages)

San Mateo County probate order: Elisa Zaffaroni irrevocable trust, J.P. Morgan Trust Company co-trustee, $4.1M distribution

EFTA01220934

Raw disk image fragment (not a PDF)

~279 sectors of Windows PC hard drive: cached web images, Dreamweaver files, system manifests. 9 JPEGs carved (7 viewable)

RESOLVED GAPS — CONCORDANCE AND WAYBACK ANALYSIS

EFTA00009782–EFTA00009785: FALSE POSITIVE (pages within multi-page PDF)">EFTA00009782–EFTA00009785: FALSE POSITIVE (pages within multi-page PDF)

The Dataset 8 concordance file (VOL00008.OPT) definitively resolves this apparent gap:


EFTA00009781,VOL00008,IMAGES\0001\EFTA00009781.pdf,Y,,,5   ← 5-page document start
EFTA00009782,VOL00008,IMAGES\0001\EFTA00009781.pdf,,,,      ← page 2 of same PDF
EFTA00009783,VOL00008,IMAGES\0001\EFTA00009781.pdf,,,,      ← page 3
EFTA00009784,VOL00008,IMAGES\0001\EFTA00009781.pdf,,,,      ← page 4
EFTA00009785,VOL00008,IMAGES\0001\EFTA00009781.pdf,,,,      ← page 5
EFTA00009786,VOL00008,IMAGES\0001\EFTA00009786.pdf,Y,,,5   ← next document



EFTA00009782-85 are pages 2-5 of EFTA00009781.pdf, not separate documents. The gap detection script flagged these because it used the

total_pages

 value from the database (which was 0 before recovery) rather than the concordance. After recovering EFTA00009781.pdf from the DOJ server (5 pages, 617,030 bytes), all content is accounted for.

Content: Case 1:19-cr-00830-AT Document 59 — Tova Noel Deferred Prosecution Agreement (MCC guard who falsified check sheets the night Epstein died, filed 5/25/2021).

EFTA00013397: RECOVERED FROM WAYBACK MACHINE (deleted from DOJ Dec 23, 2025)">EFTA00013397: RECOVERED FROM WAYBACK MACHINE (deleted from DOJ Dec 23, 2025)

The Wayback Machine CDX API reveals this file's history:

Timestamp Status Size Notes
----------- -------- ------ -------
2025-12-23 06:18:27 UTC HTTP 200 (PDF) 3,194 bytes Snapshot preserved
2025-12-23 15:58:51 UTC HTTP 200 (PDF) 3,048 bytes Second snapshot
2025-12-23 19:45:07 UTC HTTP 404 10,304 bytes File deleted from DOJ
2026-01-17 onwards HTTP 404 — Remains deleted

The file was actively removed from the DOJ server on December 23, 2025 — the same day as the initial Dataset 8 release. It was published, then deleted within hours.

Content: Recovered from the first Wayback snapshot. The PDF contains a single page reading "Native Placeholder — No Images Produced — EFTA00013397." This is a PDF companion for a native-format file (likely an XLSX spreadsheet, per the Tommy Carstensen index). The native file itself was never made available.

Context: This placeholder falls between FBI case management emails (EFTA00013395) and the Ghislaine Maxwell Superseding Indictment (EFTA00013398 — S2 20 Cr. 330). The spreadsheet it represents may have contained case tracking data or evidence inventory.



ASSESSMENT

The DOJ Epstein file release is 100% complete within its defined dataset boundaries. Every EFTA page-number across all 12 datasets is accounted for:

Resolution Count Method
------------ ------- --------
Already in corpus 1,380,932 Original archive.org download
Recovered from DOJ server 4 Direct download (EFTA00000467, EFTA00000468, EFTA00009781, EFTA00013397†)
Corrupted PDFs forensically recovered 5 Byte-level carving and CCITT fax decoding
False positive (pages within multi-page PDF) 4 Concordance (VOL00008.OPT) verification
CDN rate-limited (available on DOJ server) 23 Confirmed via HTTP 200; downloadable individually via browser
Total accounted for All 2,731,783 

† EFTA00013397 was recovered from the Wayback Machine after DOJ deleted it on Dec 23, 2025.

The 5 Bates numbering anomalies (all in DS9) are production errors where multi-page PDFs were assigned only a single EFTA number. These do not represent missing content — the pages exist within the misnumbered PDFs.

Datasets 2–7 and 10–12 are perfectly gap-free. Every EFTA number is accounted for by either a document or the page span of a preceding multi-page document.

What This Means

The "86.2% empty" figure from the earlier PHASE1 gap analysis reflected inter-dataset boundaries and the structure of the Bates numbering system across 12 separate dataset productions — not missing documents. Within each dataset's actual content, the release is total.

The only document actively removed by the DOJ was EFTA00013397 — a "Native Placeholder" PDF for what was likely a spreadsheet, positioned between FBI case management emails and the Maxwell superseding indictment. It was published and deleted within hours on December 23, 2025. Its content (the placeholder page) was recovered from the Wayback Machine.

The 23 CDN-rate-limited DS9 files are confirmed as 1-page PDFs in the concordance file, likely "Native Placeholder" pages based on their small size (~1KB in Wayback CDX records). They are available on the DOJ server but the Akamai CDN blocks bulk download attempts.

Sources Consulted

DOJ server (justice.gov) — direct HTTP status checks for all 36 EFTA numbers
Dataset concordance files (VOL00008.DAT/OPT, VOL00009.DAT/OPT) — Opticon image load files listing every document and its page assignments
Wayback Machine CDX API — historical snapshots of DOJ URLs
Tommy Carstensen EFTA Index — independent file inventory for all datasets
Tommy Carstensen Deleted Files tracker — 313 deleted files across DS9-12
archive.org DS9 bruteforce archive — JustAnotherArchivist's complete DS9 scrape (Jan 31, 2026)
yung-megafone/Epstein-Files GitHub — community checksums and torrent links
Rep. Nancy Mace press release — congressional inquiry into removed files



Generated by

tools/find_missing_efta.py and tools/recover_missing_efta.py` against full_text_corpus.db DOJ availability verified February 13, 2026 Wayback Machine recovery verified February 13, 2026 Concordance verification via VOL00008.OPT and VOL00009.OPT Cross-reference: PHASE1_GAP_DETECTION.md for range-level analysis Cross-reference: recovered_corrupted_pdfs/README.md for byte-level forensic recovery

View source on GitHub →