Document Describes Language Corpora for Book CollectionsTechnical discussion of OCR and metadata quality in multilingual book corpora
Case Filekaggle-ho-017020House OversightTechnical description of Google Books corpora methodology
Unknown1p5 persons
Case File
kaggle-ho-017020House OversightTechnical description of Google Books corpora methodology
Technical description of Google Books corpora methodology The passage only details data processing methods and corpus quality considerations for a research study. It contains no references to influential actors, financial flows, misconduct, or actionable investigative leads. Key insights: Computations performed on Google using MapReduce.; Focus on English corpora from 1800‑2000 with reliable date metadata.; Notes on limitations of metadata for place of publication and foreign language corpora.
Date
Unknown
Source
House Oversight
Reference
kaggle-ho-017020
Pages
1
Persons
5
Integrity
No Hash Available
Loading document viewer...
Forum Discussions
This document was digitized, indexed, and cross-referenced with 1,500+ persons in the Epstein files. 100% free, ad-free, and independent.
Support This ProjectSupported by 1,550+ people worldwide
Annotations powered by Hypothesis. Select any text on this page to annotate or highlight it.