Case File
efta-efta01140307DOJ Data Set 9OtherHuman Mutation
Date
Unknown
Source
DOJ Data Set 9
Reference
efta-efta01140307
Pages
4
Persons
0
Integrity
Extracted Text (OCR)
Text extracted via OCR from the original document. May contain errors from the scanning process.
Human Mutation
Back to the Future: From Genome to Metabolome
Joseph V. Thakuria,I2* Alexander W. Zaranek," George M. Church,' and Gerard T. Eierry3
'Department of Genetics, Harvard Medical School, Boston, Massachusetts;20ivision of Genetics, Massachusetts General Hospital, Harvard
Medical School, Boston, Massachusetts; 'Division of Genetics, Department of Medicine, Children's Hospital Boston, Boston, Massachusetts
For the Deep Phenotyping Special Issue
Received 20 February2012; accepted revised manuscript 28 February 2012.
Published mine 18 March 2012 in Wiley Online Library Iwnw.wiley.comMumanmutation).001: 10.1002/humu.22073
ABSTRACT: In the traditional medical genetics setting,
metabolic disorders, identified either clinically or through
biochemical screening, undergo subsequent single gene
testing to molecularly confirm diagnosis, provide further
insight on natural disease history, and inform on disease
management, treatment, familial testing, and reproduc-
tive options. For decades now, this process has been re-
sponsible for saving many lives worldwide. Only recently,
though, has it become possible to move in the opposite
direction by starting with an individual's whole genome
or exome, and, guided by this data, study more minor per-
turbations in the absolute values and substrate ratios of
clinically important biochemical analytes. Genomic indi-
viduality can also be used to guide more detailed phenotyp-
ing aimed at uncovering milder manifestations of known
metabolic diseases. Metabolomic phenotyping in the Per-
sonal Genome Project for our first 200+ participants—all
of whom are scheduled to have full genome sequence at
more than 40x coverage available by May 2012—is aimed
at uncovering potential subclinical and preclinical disease
states in carriers of known pathogenic mutations and in
lesser known rare variants that are protein predicted to
be pathogenic. Our initial focus targets 88 genes involved
in 68 metabolic disturbances with established evidence-
based nutritional and/or pharmacological therapy as pan
of standard medical care.
Hum Murat 33:809-812, 2012. 0 2012 Wiley Periodicals, Inc.
KEY WORDS: genomics; metabolomics; nutritional ther-
apy; pharmacological therapy
Background
in the 1985 American film, "Back to the Future," Marty McFly is
accidentally sent back in time to the 1950s by a plutonium powered
"flux capacitor" in a modified DeLorean upon reaching 88 mph.
Throughout the film, the impact the future has on the past is ex-
plored. For decades now, mass spectrometric analysis typically uti-
lizing a cylindrical capacitor ionization source to generate singly
charged ions has been the backbone of diagnosis, management,
and/or treatment for hundreds of inherited metabolic disorders.
Additional Supporting Information may be found in the online version of this article.
'Correspondence to: Joseph V. Thakuria, Division of Genetics, Massachusetts
General Hospital, Boston, MA02114. &mat jthakuria0geneticsmed.harvarctedu
OFFICIAL JOURNAL
HGV§1
HUMAN GENOME
VARIATION SOCIETY
wAwrq. wo
Because of proven clinical benefit, a subset of these disorders has
made their way into formal newborn screening recommendations
[ACMG, 2006). Used for second-tier biochemical confirmation in
conjunction with newborn screening programs, this technology has
saved the lives of many newborns, children, and adults the world
over. Starting with phenylketonuria in 1953, nutritional therapeutics
guided by metabolic screening and serial testing has been conclu-
sively shown to have medical benefit in a wide variety of enzyme
deficiencies and other biochemical disorders.
As we enter the genomics era, our most diagnostically challenging
cases in a medical genetics clinic are rapidly moving from a state
of having no causal molecular candidates to having many candi-
dates that need further evaluation and vetting. Nongenomic axes
supporting causality from imaging, biochemical assay, functional
cellular work, and other lines of evidence are increasingly impor-
tant to help verify pathogenicity. Of these, biochemical assays have
historically been the axis most frequently correlated with genetic
data in a medical genetics practice.
Additionally, although much progress has been made in the
screening, prevention, and treatment of inherited and primarily
autosomal-recessive biochemical disorders, limited resources have
been devoted to studying potential subclinical and preclinical dis-
ease states in carriers of known pathogenic mutations as well as in
those harboring one or more less well-defined variants in known
disease-causing genes. In large part, this is due to newborn screen-
ing and other testing modalities reliance on biochemical analytes for
screening and diagnosis. In clinical practice, the higher sensitivity,
specificity, and cost-effectiveness of screening biochemically are well
justified.
Large-scale genomic research studies utilizing next-generation
sequencing, however, provides opportunity for researchers to start
with comprehensive genomic sequence data and, secondarily, study
the resulting phenotype and biochemical profile. If consistent ab-
normal trends (even trends within the normal range) are found as-
sociated with carrier states and/or lesser known mutations in genes
causing metabolic disorders, it is intriguing to think of what effect
a modified diet specific to the defect will have on the health and
well-being of such individuals. In order to explore this possibility,
an important first step is identifying whether such trends exist and
identifying in which disorders subclinical or preclinical biochemical
phenotypes are prevalent. In some disorders, such as galactosemia,
the biochemical and phenotypic effect of carrier status, and rarer
Duarte allele I (GALT N314D + L2I8L) pin of function muta-
tions have been studied and characterized [Striver et al., 2012). In
many other metabolic disorders, however, phenotypically, little may
be known beyond the scope of classically affected patients on the
extreme end of a disease severity spectrum.
In 1908, Archibald Garrod introduced the idea of bio-
chemical individuality and described four of the first known
autosomal-recessive disorders: alkaptonuria, cystinuria, albinism,
C 2012 WILEY PERIODICALS, INC.
EFTA01140307
and pentosuria. Since then, over 300 metabolic disorders with
known diagnostic metabolic and genetic alteration have been dis-
covered. And although Norwegian physician, Ivar Asbjorn Polling
discovered phenylketonuria in 1934, it was not until approximately
20 years later that dramatically effective, evidence-based nutritional
therapy was recognized through the collective work of Lionel Pen-
rose, George Jervis, and Horst Bickel (Berry, 20101. Although the
number of severe metabolic disorders with effective dietary and/or
drug therapy continues to increase, identification of more subtle
subclinical and preclinical disease states utilizing whole genome or
exome data has not yet been explored.
Research findings will eventually move into clinical practice as
insight from next-generation sequencing technology is applied to
metabolic lessons from the past, and greater correlation between
genomic individuality and biochemical individuality is delineated
in an expanded number of individuals. Subsequently, identification
of subclinical and preclinical phenotypes should lead to effective
dietary and drug therapy in individuals exhibiting milder or non-
classic phenotypes of known metabolic diseases. As this will have
the effect of broadening both genetic and biochemical screening, a
resulting cycle of medical discovery, screening, and treatment rec-
ommendations in this area can be expected to accelerate in the
coming years.
The Personal Genome Project (PGP) is a Harvard Medical School
study with institutional review board approval for the enrollment
of 100,000 individuals for complete genomic and phenotypic study
(http://www.personalgenomes.org/). Study participants must be at
least 21 years of age. Enrollment is entirely online and requires
passing an exam testing comprehension of human subject research,
PGP protocols, and basic genetics. Study guides and consent forms
are available online at http://www.personalgenomes.org/consentl
and http://www.pgpstudy.org/ (Church, 2005; Lunshof et al., 2010).
Integrated datasets of linked genomic and phenotypic data on
each individual are made available publicly as a free resource for
the research community and to the study participants themselves.
To allow for sequence confirmation and functional studies, par-
ticipant cell lines are also made available and distributed through
theCoriell Institute (http://ccr.coriell.org/). These include fibroblast
and Epstein-Barr virus-transformed lymphoblastoid cell lines. Pri-
vate quarterly questionnaires are used to track safety and prospective
clinical outcomes.
More than 1,000 participants have provided phenotype data
via personal health records and standardized questionnaires. The
project is also actively pursuing the development and administra-
tion of new phenotyping tools with help from both the research
community and commercial organizations. Immediate phenotyp-
ing plans include providing microbiome measurements from several
body sites, telomere lengths, and methytation profiles. Participants
may then elect to participate in these additional activities as they
become available. More than 97% of participants have expressed
interest in doing so. More than 85% of participants have also ex-
pressed interest in providing discarded surgical samples for analyses
and more than 90% of participants have volunteered to provide
samples postmortem.
To date, over 1,500 individuals have fully completed enrollment
with twice as many at some stage of the enrollment process. Prom
these, 200+ are being selected to have whole-genome sequence at
more than 40x coverage from blood- and saliva-derived DNA.
Clinical prioritization of participants is aided by a questionnaire
designed to enhance for strong genetic etiology. (Table I)
In this communication, we describe initial plans for metabolic
phenotyping in our first 200+ individuals with phenotypically inte-
grated whole-genome sequence datasets. Initial analysis is focused
Table 1. PGP Screening Questions Enriching for Genetic
Etiologies
Question type(s)
Purpose
1. Age
2. Presence of severe or rare disease
phenotype (self. reported).
lives to 02. disease onset. rarity.
severity. and presence of family
history are assessed.
Ls objective disease evidence from
physician diagnosis and/or medical
testing available?
S. Will dam from MI be uploaded into
participant PGP profiles?
Demographics: geographic (from
local to continent level). as well as
ethnic 'ix.. "ethnicity" will not
always be concordant with
"geography") and gender.
Geographic and ethnic data I both
voluntary to answer/ can be
provided (or all (our grandparents.
Co-enrollment with affected or
unaffected family members? State
disease(s). affected status. and
familial relationship.
8. What type of biological samples will
be provided (e.g.. blood. saliva.
"normal' flora ((or microbiomes).
skin. or other tissues)?
fin both early-onset disease
and advanced age controls with
retrospective data.
Prioritize by condition or suspected
genetic etiology (free text permitted
for detailed responses).
Prioritize further within the disease
category of interest.
Prioritize diseases with evidence beyond
self-reporting and/or with supporting
laboratory. imaging. or genetic data.
Prioritize by accessible medical
phenotype dams.
Provide flexibility in rapid
hypothesis-driven prioritization of
already enrolled cohorts.
Finable ancestry. epigenetic.
environmental studies.
Apply appropriate population frequency
thresholds when interpreting"-omic"
variants and other datasets.
Prioritize on feasibility of familial-based
genomic or other analyses.
Prioritize based on available tissue/cell
types or feasibility of somatic venus
tramline comparative studies.
on 88 genes involved in 68 well-established biochemical genetic dis-
orders with known dietary and/or pharmacologic treatment. The
vast majority of primary and secondary newborn screening tar-
gets recommended by the American College of Medical Genetics
(ACMG) are included (Supp. Table SI).
Methods
Purified DNA from saliva or blood on over 200 PGP participants
are slated for library preparation and sequencing by Complete Ge-
nomics, Inc. Data are annotated using their 2.X pipeline matching
against the National Center for Biotechnology Information (NCBI)
build 37 reference genome. A preliminary interpretation derived
from this data is provided privately to participants and becomes
public after they are allotted 30 days for review. Individual datasets
are linked to the participant ID and are published in the public
domain under the Creative Commons CCO waiver.
We have developed the GET-Evidence system to produce reports
and make datasets available to the study participants and to the pub-
lic. The purpose of GET-Evidence is to build up a public database of
variant annotations that will ultimately be used to assist in clinical
analysis. GET-Evidence prioritizes variants for review based on allele
frequency, protein-predicted pathogenicity, and presence in clini-
cal gene and variant databases. As more variants are reviewed, the
participants' reports are updated to reflect the newer annotations.
For user-specified analyses, Clinical Future (founded by J.V.T. and
A.W.Z. with support from G.M.C.) has developed the Genome Pars-
ing System "GPS"—a secure, private Web service for genomic and
phenotypic data management and filtration. A sample GPS analysis
of the PGP pilot genomes is found in Figure I. The system has been
used to effectively filter variants for high-clinical importance parsing
810
HUMAN MUTATION. Vol. 33. No. S. 809-812. 2012
EFTA01140308
GPS: Genome Parsing System
Genomes
Variant5
Reports
Collaborate
Lag out
Terms or service
o Cases
Ou6E4515 hu738fFF
hu936584 huA9OCE6
huAE6220 holSEDA08
nuC30901 nuEttOC3D
0 Contras
0 Ai evalatie gnomes
o yerant htte3
Stray a ist or penes
ACADS ACADM ACADS ACADVI.
Ragusa
Conk
Fracvency Inresnoal
frequency <
Recant
Om*
S.
0:800se • valiant Mel
- 1-41
1-,66
Rating
GenNAA chap,
cccednetes
MD D44411
Aides
Nuotaloonvot
80604039 )
G • NG
(84A9OCE6)
CJG
3
Dominance frequency RonfOnan7
Search:
Deebews
1307856' OJEL=1.91911169
Preddied b be damson° Other
measure InAlittall VI ',SWIM C.11U•0
Vet, La.,' Chain 44)4Coenyrne A
, 3156416/7
05443215A))
DahrOgooehose Defooency. 4.4434
C/G
(hu728449
%leen] is reviltIOnel in an orar.
clauposs• Wong ono 4%4 eon»
C/G
(nwC30901)
AC/OVI. RISSIV
C
CIT
mews* 0.7a%
0.999
GET•tvIdenca
(8003858A)
MIR 77495
C — QT
0.76%
0.999
YET-Erklena
MTh 0314N
a08..-.0A/C
340)51)
0.78%
0.032
R116252762. GE.T.3.170921
(nuMiC013)
•
UROD 999
QG
0.76%
0.662
gflivoenct
01003408M
PROOn R:9:5
C • A/C
0.78%
GE T-EvOenc•
(hu93150A)
seR G9CC
0
Ad
0.78%
0116252762 COT-Pridenre
(0
A5013)
ANT StRiL
CPI 4,4345%
0 —
uC30901)
0.78%
0.13
T-Evdena
StC7A9 4182T
C • QT
ins."
0.76%
YET-88818808 0.11M
00.9311518.0
POI v2454
A -a N0
rec•Oto•
0.76%
0.976
P0.'251494, filaria710191.
m0000799
Showing 1 tO 66 a 66 entrees (Mend ban 35,039 total intros)
Figure 1. Genome Parsing System (GPSI screenshot: Whole-genome data from 16 Personal Genome Project (POP) participants parsed against
88 metabolic disease genes show an average of four to nine variants per genome, are less than 5% in frequency, and appear in OMIM and/or are
protein predicted to be damaging. P1.8.: the predominance of the MAP of 0.0078 in these rarest variants occurs because each variant occurs only
once in a limited frequency database of 64 public genomes used for this analysis.
genomic data against clinical gene and variant databases, filtering by
lowallele frequency and protein-predicted pathogenicity lAdzhubei
et al., 20101. By analyzing aggregate data from 5,400 individual ex-
omes, available from the NHLBI Exome Variant Server, we find four
to nine variants with frequency less than 10% specifically from the
88 genes associated with the targeted disorders from Supp. Table SI.
In the PGP pilot data, each participant has four to nine variants
with frequency less than 5% and zero to one variants in OMIM
(www.omim.org) specifically from the 88 genes associated with the
targeted disorders from Supp. Table SI. When analysis is extended
to the NHLBI Exome Variant dataset, we find slightly fewer variants,
three to seven on average per exome, with a frequency less than 5%
(Exome Variant Server, 2012).
Consensus from several publications also indicates that an aver-
age of 10-30 variants per genome are present heterozygously for
autosomal-recessive disorders. One or more of these typically in-
volve established metabolic disorders. Furthermore, we avoided the
summation due to the wide population-specific variability for each
disorder, but adding up estimated carrier rates for all 88 disorders
should also support the hypothesis of finding at least one biochem-
ical disorder of interest, simply on the basis of carrier status for
at least one treatable metabolic disorder listed in Supp. Table SI
(Lupski et al., 2014
All 200+ participants will have the following laboratory stud-
ies performed in a CLIA certified clinical laboratory for bio-
chemical phenotyping that are relevant to the treatable disor-
ders listed in Supp. Table SI: plasma amino acids, urine organic
acids, plasma acylcarnitines, urine acylglycines, basic chem7, NH4
level, camitine profile (free and total), folate level, zinc level,
B12 level, urine-reducing substances, lipid profile, hemoglobin
electrophoresis, pyridoxine level, biotin level, urine galactitol,
galactose-1 -phosphate, copper level, ceruloplasmin, magnesium
level, carbohydrate-deficient transferrin, urine and plasma porpho-
bilinogen, urine and plasma delta-aminolevulinic acid, RBC plas-
malogens, pipecolic acid, and plasma very-long-chain fatty adds.
The majority of these biochemical tests will be performed in-house
at Children's Hospital Boston and Massachusetts General Hospi-
tal with some highly specialized tests being performed by outside
clinical collaborators (Table 2).
After identification of both known and potentially pathogenic
mutations within the targeted 88 biochemical genes with the GPS
platform (Supp. Table SI), we will analyze participant metabolite
values and ratios in which mutation status suggests possible devi-
ation from normal values using Mann—Whitney and IColmogorov—
Smimov tests. Analyses for statistically significant and pathophysi-
ologically consistent differences observed against matched controls
will be aided by performing the same biochemical testing on all
participants and allowing each participant to also serve as control
for the biochemical disorders and pathways in which they are not
found to have potentially pathogenic mutations.
Discussion
The concept of biochemical individuality first introduced by Gar-
rod has had enormous impact on modern medicine and human
HUMAN MUTATION, WI. 33, No. 5.809-812.2012
811
EFTA01140309
Table 2. Planned Biochemical Phenotyping for 200+ PGP
Participants with Whole-Genome Data
Plasma amino acids
Urine organic acids
Plasma arylcamitines
Urine acylglycines
Sodium
Potassium
chloride
Bi<JfIX/Ilite
Blood urea nitrogen
Creatinine
Glucose
NH4 level
Camitine profile 'free and total)
Folate level
Zinc level
B12 level
Urine-reducing substances
Lipid profile
Hemoglobin electrophoresis
Pyridoxine level
Biotin level
Urine galactitol
Galactose-I- phosphate
Copper level
uloplasmin
Magnesium level
Carbohydrate deficient transferrin
Urine and plasma porphobilinogen
Urine and plasma delta-aminolnulinic acid
RM: plasinalogens
Pipecolic acid
Plasma wry-long-chain fatty acids
genetics. In contrast, due to direct observation of familial similari-
ties, especially physical similarities in the case of monozygotic twins,
"genomic individuality" has not only been assumed since before the
term "genome" was coined but also could correctly be considered a
redundant term. Yet, only recently, with the deep sequencing of mul-
tiple whole genomes, exomes, and targeted sequencing of genes in
the tens to thousands becoming more practical in clinical research,
are we able to systematically study and correlate three critical axes
of medical research: genomic, metabolomic, and phenomic. Addi-
tional axes, such as functional data on an individual's cell line, will
also aid in supporting hypothesis of causality. Four decades worth of
observational data on the natural history of treated patients for some
of these disorders that were the first to be biochemically screened
for in the 1960s is also extremely informative.
We expect to see correlations between rarer variants and larger
deviations from normal (in the expected direction for the specific
disorder and biochemical metabolites). The frequency and degree
to which analyte deviations are in the expected direction for the
particular disorder will also be biostatistically analyzed. Since all
200+ participants will have the full range of biochemical studies
relevant to 88 genes involved in 68 treatable biochemical disorders,
those without suspected pathogenic variants in a specific gene(s)
or disorder will serve as controls for those who are biochemically
studied based on sequence data for the same specific disorder.
Achieving statistical significance correlating relevant biochemi-
cal analytes with genomic data in individuals found to have one or
more potentially pathogenic mutations across these 68 biochemi-
cal disorders in over 200 individuals will be challenging because of
multiple hypothesis testing. We still expect to see interesting data
trends supporting known biochemical pathophysiology even in this
cohort size when targeting the rarest protein altering variants. In
some instances, statistically significant differences should eventu-
ally be observed once a critical mass of individuals with matching
genotype, metabolic profile, and phenotype is reached.
Neither the metabolic diseases we have chosen to study in our
initial metabolic analysis nor the laboratory tests we will perform
on all 200+ individuals are comprehensive of treatable metabolic
disorders or available clinical biochemical testing, respectively, but
it should generate helpful pilot data and lay the foundation for future
trials studying an expanded number of genes, metabolic disorders,
and individuals.
Our finding of four to nine rare variants predicted to be
pathogenic variants per genome on average within 88 genes causing
metabolic disease with established dietary and/or pharmacologic
therapy is highly dependent on the filtering algorithm. This low
figure is also bounded by the limited number of genes studied and
our current understanding of metabolic diseases. Regardless, at 10
or less variants per person with our current algorithm, the prospect
of systematic development of individualized dietary and/or medical
data informed by genomic and metabolomic data finally comes into
practical view.
We anticipate the biochemical interrogation of 200+ whole
genomes guided by genomic individuality, and linked to a pro-
cess of individual phenotype data gathering guided by the known
natural history of a subset of clinically well-characterized metabolic
disorders will prove valuable.
Identifying the genomic and metabolomic circumstances under
which subclinical or predinical states exist for these same disorders
may eventually lead to the first evidence-based efficacy studies for
nutrigenomics in these patients who would now otherwise go un-
treated and undetected by current methods and standard practices.
Acknowledgments
Disclosure Statement I.V.T. and A.W.Z. declare potential conflict of interest
as cofounders of Clinical Future. Inc.. Somen•ille, MA.
References
Adchubei IA. Schmidt S. Peshkin L Ramensky VEGerasimova A. Bork P. Kondrashov
AS. Sunyaev, SR. 2010. A method and seem for predicting damaging missense
mutations. Nat Methods 7:248-249.
American College of bledical Genetics. 2006. Health Resources and Services Adminis-
tration r:ommisNioned Report. Newborn screening: toward a uniform screening
panel and system. (kiwi Med 8:15-2525.
Berry GT. 2008. Metabolic profiling. Nestle Nutt Workshop See Pediair Program62:55.-
75.
Church GC. 2005. Personal genome project. Mol Syst Biol I-3.
Home Variant Server. NHLBI Esome Sequencing Project (ESP). Seattle. WA. Available
at: http://evs.gsvrashington.edu/EVS/. (Accessed Faxuary. 20l2).
Lunshof IL. /lobe 1. Aach I. Angrist M. Thakuria IV. Vorhaus DB. Hoehe MR. (lurch
GM. Personal genomes in progress from the human genome project to the per-
sonal genome project. 2010. Dialogues (lin Neurosci 12:47-60.
Lupski JR. Reid IG. Gonraga.Jauregui C. Rio Deiros D. Chen DC. Narareth L. Bain-
bridge M. Dinh H. ling C. Wheeler DA. McGuire AL 7.hang F. and others. 2010.
Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N
Engl ) Med 362:1181-1191.
Scrim
Beaudet AL. Sly WS. Wyk D. Childs B. Kindler KW. Vogelstein B. 2012.
Metabolic and molecular haws of inherited disease. New York: McGraw-Hill.
812
HUMAN MUTATION. Vol. 33. No. 5.809-812.2012
EFTA01140310
Technical Artifacts (15)
View in Artifacts BrowserEmail addresses, URLs, phone numbers, and other technical indicators extracted from this document.
Domain
www.omim.orgFlight #
NH4Phone
16252762Phone
3156416Phone
809-812.2012Phone
8818808Phone
9311518Tail #
N314DURL
http://ccr.coriell.orgURL
http://evs.gsvrashington.edu/EVSURL
http://www.personalgenomes.orgURL
http://www.personalgenomes.org/consentlURL
http://www.pgpstudy.orgWire Ref
ReferencesWire Ref
referenceRelated Documents (6)
Forum Discussions
This document was digitized, indexed, and cross-referenced with 1,400+ persons in the Epstein files. 100% free, ad-free, and independent.
Annotations powered by Hypothesis. Select any text on this page to annotate or highlight it.