Methodology

Redaction Text Layer Forensic Analysis

2 EFTA citations2,364 words0 persons referenced

The "exposed text" is garbled OCR of low-resolution scanned images. There is NO hidden readable text behind black rectangle overlays in these PDFs.

Redaction Text Layer Forensic Analysis

EFTA00000476.pdf and EFTA00001932.pdf - December 2025 vs. Re-Release Versions">EFTA00000476.pdf and EFTA00001932.pdf - December 2025 vs. Re-Release Versions

Date: 2026-02-08 Analyst: Forensic PDF structure investigation Subject: Determining whether "exposed text" from poorly-redacted PDFs represents hidden readable text behind black rectangles, garbled OCR, encoding corruption, or missed text layers

EXECUTIVE SUMMARY

The "exposed text" is garbled OCR of low-resolution scanned images. There is NO hidden readable text behind black rectangle overlays in these PDFs.

Both EFTA00000476.pdf and EFTA00001932.pdf are image-based scanned documents with invisible OCR text layers. The OCR text layer uses PDF Text Rendering Mode 3 (invisible) and is positioned BEHIND the scanned image in the rendering order. The text appears garbled because OCR software attempted to read:

A photograph of a financial ledger on a manila envelope (EFTA00000476) at only 96 DPI

A handwritten letter in cursive blue ink on decorative paper (EFTA00001932) at only 96 DPI

Neither document contains text-based PDF content with black rectangle overlays hiding selectable text underneath. The viral claim of "poorly redacted" documents exposing hidden text behind copy-paste-removable black bars is not supported by the PDF structure of these specific files.

METHODOLOGY

Files Analyzed

File

Version

Path

Size

------

---------

------

EFTA00000476.pdf Original (Dec 19) local analysis file originals/december_2025/VOL00001/IMAGES/0001/EFTA00000476.pdf 365,781 bytes

EFTA00000476.pdf Current (re-release) DOJ dataset file dataset1/DataSet 1/DataSet 1/VOL00001/IMAGES/0001/EFTA00000476.pdf 362,263 bytes

EFTA00001932.pdf Original (Dec 19) local analysis file originals/december_2025/VOL00001/IMAGES/0002/EFTA00001932.pdf 573,379 bytes

EFTA00001932.pdf Current (re-release) DOJ dataset file dataset1/DataSet 1/DataSet 1/VOL00001/IMAGES/0002/EFTA00001932.pdf 572,881 bytes

Tools Used

PDF analysis tools for PDF structure analysis

pdftotext for text extraction

pdfimages for image listing

PIL/numpy/scipy for pixel-level image comparison

Direct content stream parsing for rendering order verification

FINDING 1: PDF RENDERING PIPELINE

All four PDFs (both versions of both files) share an identical 5-layer rendering structure:


Layer 1: Graphics state save (q)
Layer 2: INVISIBLE OCR TEXT LAYER (Text Rendering Mode 3)
Layer 3: SCANNED IMAGE (/Im0 Do) - rendered ON TOP of text layer
Layer 4: WHITE RECTANGLE (clip mask) + BLACK EFTA LABEL at bottom
Layer 5: End



Evidence from content streams:

EFTA00000476 Original - Content streams [23, 6, 7, 24, 25]:

Stream 23 (1 byte): q - graphics state save

Stream 6 (24,617 bytes): OCR text with 3 Tr (invisible mode), 410 unique Tz values

Stream 7 (34 bytes): q / 864 0 0 576.75 0 0 cm / /Im0 Do / Q - image rendering


Stream 24 (171 bytes): White rectangle fill + hex-encoded "EFTA00000476" label
Stream 25: End

EFTA00001932 Original - Content streams [29, 22, 4, 30, 31]:

Stream 29 (1 byte): q - graphics state save

Stream 22 (15,993 bytes): OCR text with 3 Tr (invisible mode), 279 unique Tz values


Stream 4 (34 bytes): Image rendering
Stream 30 (142 bytes): White rectangle fill + hex-encoded "EFTA00001932" label
Stream 31: End

Critical Detail: Text Rendering Mode 3

From the raw content streams:


3 Tr



PDF Text Rendering Mode 3 means the text is neither filled nor stroked - it is completely invisible. This is the standard method used by OCR software (such as ABBYY FineReader, Adobe Acrobat's OCR, or OmniPage) to create a "searchable" text layer behind a scanned image. The text exists only for search/copy functionality, not for visual display.



FINDING 2: OCR SIGNATURE PROOF

The text layers exhibit unmistakable OCR signatures:

Wildly Varying Font Sizes
Document Version Unique Font Sizes
---------- --------- -------------------
EFTA00000476 Original 197
EFTA00000476 Current Different OCR run
EFTA00001932 Original 266
EFTA00001932 Current Different OCR run

Real PDF text typically uses 2-10 font sizes. Having 197-266 unique sizes means OCR software is assigning different sizes to each word to match the spatial dimensions detected in the scan.

Wildly Varying Horizontal Scaling (Tz)
Document Unique Tz Values
---------- -----------------
EFTA00000476 Original 410
EFTA00000476 Current 272
EFTA00001932 Original 279
EFTA00001932 Current 248

The Tz operator sets horizontal text scaling. OCR software varies this per-word to fit each recognized word into the exact pixel width of the original. Real text documents have Tz=100 (or a handful of values). 272-410 unique values is absolute proof of OCR generation.



Standard OCR Font Names
All four PDFs use identical non-embedded standard fonts:

Courier (OPBaseFont0)

Helvetica (OPBaseFont1)

Helvetica-Bold (OPBaseFont2)

Times-Roman (OPBaseFont3)

ArialMT (OPExtFont0)

These are the default substitute fonts used by OCR engines when the actual font is unknown. The OPBaseFont naming convention is specific to OmniPage OCR software.





FINDING 3: IMAGE ANALYSIS

Both PDFs contain a single embedded image per page

Document Image Size Color Resolution Coverage
---------- ----------- ------- ------------ ----------
EFTA00000476 1152x769 Indexed (1-bit/8bpc) 96 DPI Full page
EFTA00001932 1152x769 Indexed (1-bit/8bpc) 96 DPI Full page

96 DPI is extremely low for OCR purposes (typical OCR requires 300+ DPI for good accuracy). This explains the garbled text output.

Images are DIFFERENT between versions

Pixel-level comparison shows the images were re-scanned or re-processed:

Document Changed Pixels Percentage
---------- --------------- ------------
EFTA00000476 716,787 / 885,888 80.91%
EFTA00001932 808,168 / 885,888 91.23%

Despite the massive pixel-level difference, the images look visually similar to the human eye - this indicates re-scanning from the same physical document (or re-processing with different settings).

EFTA00001932 has an ADDITIONAL redaction in the current version">EFTA00001932 has an ADDITIONAL redaction in the current version

Visual comparison reveals the current (re-released) version of EFTA00001932 has a new black rectangle that does not exist in the December 19 original:

Original: 1 black rectangle at image coordinates (73,105)-(165,222) - top-left area (likely covering the greeting/name)
Current: 2 black rectangles - the original one PLUS a new one at approximately (370,524)-(403,676) - middle area of the letter

This new redaction is baked into the scanned image itself, not a PDF annotation overlay. It was applied by re-scanning or re-processing the physical document with an additional physical or digital redaction.



FINDING 4: WHAT IS THE "EXPOSED TEXT"?

EFTA00000476 (Financial Ledger)">EFTA00000476 (Financial Ledger)

The document is a photograph of a financial ledger lying on a manila envelope. The image shows:
A spreadsheet/table with columns for dates, descriptions, and dollar amounts
Black marker redactions covering certain cells in the physical document
The document was photographed (not flatbed scanned) at low resolution

The "213 lines of exposed text" are OCR's attempt to read this low-resolution photograph:


04044 so 4,10y yentaYI ory a 4
Afaoutt W a Paso pew teoi 016.4
L290 /39 51 92100'0I



This is not "exposed hidden text" - it is garbled OCR of the visible printed content in the photograph, mangled by:
Low resolution (96 DPI)
Angle distortion (it's a photograph, not a scan)
Complex table layout confusing the OCR engine
Black marker redactions creating partial character occlusion

EFTA00001932 (Handwritten Victim Letter)">EFTA00001932 (Handwritten Victim Letter)

The document is a handwritten letter in blue cursive ink on decorative stationery paper with a cartoon owl design. The "47 lines of exposed text" are OCR's attempt to read cursive handwriting:


ear e.i-freA;
1- i.O,,-_)c ot( hac.i. a wonderi-iti 110ii-
dali SeaSO11.



Manual reconstruction suggests this reads approximately:


Dear [name],
I hope [you] had a wonderful holiday
season.



The letter is a victim thank-you letter to Epstein, expressing gratitude for:
Holiday/Christmas celebrations
Trips to Palm Beach, Las Vegas, Mexico, and an island
Flying her sister out to visit
Use of a Manhattan apartment
Help seeing her mother
"Pushing me to be at my best"

This is consistent with the well-documented grooming pattern where victims were conditioned to express gratitude for material benefits.



FINDING 5: NO BLACK RECTANGLE OVERLAY HIDING TEXT

Annotation Check
Document Version PDF Annotations Redaction Annotations
---------- --------- ---------------- ----------------------
EFTA00000476 Original 0 0
EFTA00000476 Current 0 0
EFTA00001932 Original 0 0
EFTA00001932 Current 0 0

Zero PDF redaction annotations exist in any version. There are no "overlay" black rectangles in the PDF structure.

Drawing Object Check
Document Version Drawings Black-Filled Description
---------- --------- ---------- ------------- -------------
EFTA00000476 Original 1 0 White page border only
EFTA00001932 Original 1 0 White page border only

The only drawing objects are white-filled page borders. No black rectangles exist as PDF drawing objects.

Where are the black rectangles?

The black rectangles visible in these documents are baked into the scanned images themselves. They are part of the pixel data of the embedded raster image. This means:

The physical documents were redacted (with black tape, marker, or digital masking) BEFORE scanning
The scanner captured the already-redacted document
No text exists "behind" the black rectangles because the text was physically obscured before the scan

For EFTA00000476 (financial ledger):
143,650 near-black pixels in the image (17.1% of image area)
Largest black region: 998x561 pixels (the table area with multiple column redactions)
These are physical marker/tape redactions on the printed document, captured in the photograph

For EFTA00001932 (victim letter):
Original: 1 black region at (73,105)-(165,222) = 92x117 pixels
Current: Same region PLUS new region at (370,524)-(403,676) = 33x152 pixels
The additional redaction in the re-release was applied to the image (re-scanned or digitally added to the raster)



FINDING 6: ORIGINAL vs. CURRENT COMPARISON

Text Layer Differences
Document Original Text Length Current Text Length Identical?
---------- --------------------- -------------------- -----------
EFTA00000476 2,853 chars 2,392 chars No
EFTA00001932 1,409 chars 1,240 chars No

The text layers differ because different OCR runs produce different results from different scans of the same physical document. The current versions were re-scanned and re-OCR'd, producing slightly different (but equally garbled) text.

Key differences for EFTA00001932:
Original OCR produces 257 words; Current OCR produces 229 words
The original has "Mexico", "Vegas", "Friends"; the current has "Arizona", "Christmas", "circus"
Both are equally garbled attempts at reading the same handwriting
The current version lost some text near the new redaction area

File Size Differences
Document Original Size Current Size Difference
---------- -------------- ------------- ------------
EFTA00000476 365,781 362,263 -3,518 bytes
EFTA00001932 573,379 572,881 -498 bytes

The slight size differences are consistent with different compression of the re-scanned images and different OCR text content.



CONCLUSION

Answer to the Key Question

Do the original December 19 PDFs have actual selectable text behind black visual rectangles (text-based PDFs with overlay redactions)?

NO. The evidence conclusively shows:

These are image-based scanned documents, not text-based PDFs
The text layer is invisible OCR (Text Rendering Mode 3), placed behind the image
The OCR is garbled because of 96 DPI resolution, cursive handwriting, and photographic distortion
The black rectangles are in the images, not PDF overlays - they represent physical redactions applied before scanning
No PDF annotations or drawing objects create the visible redactions
The "exposed text" is garbled OCR artifacts, not hidden text behind removable black bars

Assessment of the Viral "Poorly Redacted" Claim for These Specific Files

For EFTA00000476 and EFTA00001932 specifically, the claim that redacted text can be "exposed" by copying/pasting from behind black rectangles is overstated. The "exposed text" is the OCR engine's garbled attempt at reading visible (non-redacted) content from a low-resolution scan. It does not reveal information hidden by the redactions.

However, it is worth noting:
EFTA00001932 did receive an additional physical redaction in the re-release, covering what appears to be a signature or name area. This confirms that DOJ recognized at least some content needed additional redaction.
The OCR text, while garbled, does provide fragmentary readable content from the non-redacted portions (e.g., recognizable words like "Christmas," "Manhattan," "Mexico," "friends," "wonderful")
Other documents in the collection may have different redaction methods - this analysis applies only to these two specific files

Root Cause of Garbled "Exposed Text"

The text appears garbled because:
96 DPI scan resolution - far below the 300 DPI minimum recommended for OCR
Handwriting recognition failure (EFTA00001932) - cursive blue ink is extremely difficult for OCR
Photographic distortion (EFTA00000476) - the document was photographed, not flatbed scanned

OmniPage OCR limitations - the OPBaseFont naming in the fonts confirms OmniPage was used, which has limited handwriting recognition


Indexed color space - both images use 1-bit indexed color (essentially black and white), losing any gray-scale information that could help character recognition



APPENDIX: Technical Evidence

EFTA00000476 Original - OCR Layer (first 500 bytes)">Content Stream: EFTA00000476 Original - OCR Layer (first 500 bytes)


%WB0AiUxr
q
1 0.06 -0.06 1 17.43 -26.32 cm
BT
0 0 0 rg
0 0 0 RG
1 0 0 1 252.8 489 Tm
77.33 Tz
3 Tr/OPBaseFont0 8.33 Tf(\)1 )Tj
1 0 0 1 246.23 475.12 Tm
65.18 Tz/OPBaseFont0 15.62 Tf(4 )Tj



Key operators:

3 Tr = Text Rendering Mode 3 (invisible)

77.33 Tz / 65.18 Tz = per-word horizontal scaling (OCR signature)

/OPBaseFont0 = OmniPage default font substitute



Content Stream: Image Rendering Layer


q
864 0 0 576.75 0 0 cm
/Im0 Do
Q


This renders the image at full page size, ON TOP of the invisible text layer.

EFTA00000476 Original)">Font Inventory (EFTA00000476 Original)


(9,  'n/a', 'Type1', 'Courier',        'OPBaseFont0', 'WinAnsiEncoding')
(10, 'n/a', 'Type1', 'Helvetica',      'OPBaseFont1', 'WinAnsiEncoding')
(11, 'n/a', 'Type1', 'Helvetica-Bold', 'OPBaseFont2', 'WinAnsiEncoding')
(12, 'n/a', 'Type1', 'Times-Roman',    'OPBaseFont3', 'WinAnsiEncoding')
(13, 'n/a', 'Type1', 'ArialMT',        'OPExtFont0',  'WinAnsiEncoding')

All fonts are non-embedded standard Type1 fonts with OmniPage naming.

View source on GitHub →