Case File
efta-efta01140242DOJ Data Set 9OtherPersonalized genomic disease risk of volunteers
Date
Unknown
Source
DOJ Data Set 9
Reference
efta-efta01140242
Pages
14
Persons
0
Integrity
Extracted Text (OCR)
Text extracted via OCR from the original document. May contain errors from the scanning process.
Personalized genomic disease risk of volunteers
Manuel L. Gonzalez-Garay'', Amy L. McGuireb, Stacey Pereirab, and C. Thomas Casket'
'Center for Molecular Imaging, Division of Genomia and Bioinformatics, The Brown Foundation Institute of Molecular Medicine, University of Texas Health
Science Center, Houston, TX 77030; and ',Center for Medical Ethics and Health Policy, Department of Medicine and Medical Ethics, and 'Department of
Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030
Contributed by C. Thomas Caskey, August 27, 2013 (sent for review July 11, 2013)
Next-generation sequencing (NGS) Is commonly used for
researching the causes of genetic disorders. However, its useful-
ness in clinical practice for medical diagnosis is in early de-
velopment. In this report, we demonstrate the value of NGS for
genetic risk assessment and evaluate the limitations and barriers
for the adoption of this technology into medical practice. We
performed whole exome sequencing (WES) on 81 volunteers, and
for each volunteer, we requested personal medical histories,
constructed a three-generation pedigree, and required their
participation in a comprehensive educational program. We lim-
ited our clinical reporting to disease risks based on only rare
damaging mutations and known pathogenic variations in genes
previously reported to be associated with human disorders. We
identified 271 recessive risk alleles (214 genes), 126 dominant risk
alleles (101 genes), and 3 X-recessive risk alleles (3 genes). We
linked personal disease histories with causative disease genes in
18 volunteers. Furthermore, by incorporating family histories into
our genetic analyses, we identified an additional five heritable
diseases. Traditional genetic counseling and disease education
were provided in verbal and written reports to all volunteers. Our
report demonstrates that when genome results are carefully
interpreted and integrated with an individual's medical records
and pedigree data, NGS is a valuable diagnostic tool for genetic
disease risk.
molecular medicine I disease prediction I whole exome sequencing
S
equencing the whole genome of patients with genetic dis-
orders has become reality since the sequencing of the first
individual human in 2007 (1). Further advances in massively
parallel DNA sequencing are reducing the price of sequencing
an entire genome or exome. The quality and speed of sequencing
and analyzing a personal genome are improving at an unprece-
dented pace, making possible the introduction of next-generation
sequencing (NGS) into the clinic on a research basis (2-7).
Advancements in NOS have stimulated international research
initiatives to identify genetic links to rare disorders in children,
with an average diagnostic success of 20-25% and the discovery
of new disease-gene associations (8-12).
The rapidly increasing number of aging adults in our society
will place unprecedented demands on the health care system. To
provide adults with a healthy longevity we need to develop
a system to identify genetic risk and apply early intervention on
pathology progression. In this report, we decided to sequence the
whole exomes of a healthy adult cohort of 81 volunteers and
evaluate the value of applying NOS in combination with medical
history and pedigree data. In this report we plan to address three
main questions. (i) What genetic discoveries need to be provided
to the volunteers? (ii) What is the practical value of delivering
this information to volunteers? (iii) What are the challenges and
barriers to the adoption of this powerful technology into medical
practice?
The individual genetic reports yield helpful medical risk in-
formation, suggesting that population sequencing of asymptom-
atic adults may prove to be valuable and useful. We provided to
the participants, under our institutional review board, genetic
risk findings from the analyses and genetic counseling to discuss
their results.
Results
Categories of Variants to Report to Patients. Variants obtained
from our workflow (described in Fig, 1) were reported using
three categories. Our first variant category consists of variants
identified in an individual where the alleles are found in Human
Genome Mutation Database (HGMD) (13, 14) and labeled
disease-causing mutations (DM). These alleles also were re-
quired to be rare (<1% allele frequency in 6,500 exomes from
the National Heart, Lung, and Blood Institute (NHLBI) Exome
Sequencing Project (15) and the 1,000 Genomes Project
Genomes (16, 17)] and predicted to be damaging to protein
function by two of three predictions algorithms [Polyphen 2.0
(18), Sift (19-24), and MutationTaster (25)] using Database of
Human Non-synonymous SNVs and their functional predictions
and annotations (dbNSFP) (26) as described in Fig. 2. The genome
sequence data of each volunteer were reviewed and interpreted,
taking into account personal medical history, a three-generation
pedigree with family history of diseases, and bioinformatics
analysis. The medical history of each volunteer in this cohort was
rich with detail because each had a private physician used for
annual examinations, and in some cases. disease therapy. Fig. 3
summarizes the results of our pipeline: we recruited 81 non-
related volunteers and sequenced their genomic DNA using
exome sequencing. We detected 65,582 unique nonsyttonymous
coding variants (nscv). Every nscv was interrogated for human
inherited disease mutations using the HGMD (13, 14) database
from Biobase (DM category consisting of 109,708 variations).
We were able to detect 1,036 HGMD (13, 14) DM variations.
After using the filters described in Fig. 2, the number was reduced
to 275 pathogenic variants. We identified in our cohort 208 au-
tosomal recessive (AR) alleles (169 genes), 64 autosomal domi-
nant (AD) alleles (44 genes), and three X-linked recessive (XLR)
Significance
Replacing traditional methods for genetic testing of inheritable
disorders with next-generation sequencing (NGS) will reduce
the cost of genetic testing and increase the information avail-
able for the patients. NGS will become an invaluable resource
for the patient and physicians, especially if the sequencing in-
formation is stored properly and reanalyzed as bioinformatics
tools and annotations improve. NGS is still at the early stages
of development and it is full of false-positive and -negative
results and requires infrastructure and specialized personnel to
properly analyze the results. This paper will explain our expe-
rience with an adult population, our bioinformatics analysis,
and our clinical decisions to assure that our genetic diagnostics
were accurate to detect carrier status and serious medical
conditions in our volunteers.
Author contributions: m I.G.-0. and CT.C. designed research; PAL.G.4 . A LM.. 5.P.. and
CT.C. performed research; PA.L.G.4. analysed data; and M.LGA...A.L.M. S.P. and CT.C.
mote the paper.
The authors declare no conflict of Interest.
Freely available online through the PNAS open access option.
'To whom correspondence may be addressed. E-mail: manuell.GonzalezGarayeluth,unc.
edu or tcaskeyelbotedu.
This article contains supporting information online at vninv.pnas.orgiloalcupisuppildoi:10.
1073/Dna 13159341 IONIXTv0Plementet
www.yroas.orgrcgikloill0.10734mas.1315936110
PNAS Earty Edition I 1 of 6
EFTA01140242
IINININOISNONInC•
NovoMen moNnint la enlocenot MIN Went+ meow,:
SAPAtesb/Picard
SAM Me
.Remove duplicate
•Rnallbrate aligrenents
GAN
•local rtalignmnits
a
c" dance
SNIVIndel taller
GAIN% Bantian
SNM
Welt
%Ns
uwEll
ANNMAR
annotated sciostindels a I Moons
Cereal° deournamoon
Hg. 1. Workflow for processing NGS data. Raw sequencing data are aligned
against the reference sequence using Novoalign software from NovoCraft.
SAM files are preprocessed using SAMtoots and Picard to create BAM files and
remove duplicates. The Genome Analysis Tcolkit (GATK) is then used to
recalibrate the alignments, perform local realignments, and identify SNPs and
indels. Finally, SnpEff and ANNOVAR are used to annotate variants.
alleles (3 genes). These data resulted in an average of 3.5 disease
allele reports per volunteer.
The approach for a second category of variants consisted of
creating a personalized list of candidate genes from Online
Mendelian Inheritance in Man (OMIM) (27, 28) known to be
associated with the disorders reported in the medical literature.
We detected 131 alleles (131 genes) using this approach. Each one
of these variants provided a potential causation for the volunteer's
disorders. Each one of the variations obtained from this approach
passed our stringent pipeline. This approach added on average
another 2.0 disease alleles per volunteer report.
The third approach used a family history to create a person-
alized list of candidate genes from OMIM (27, 28). and as be-
fore, we compared our list of candidate genes with the disorders
reported in the family history.
Before reporting an allele to the volunteer, we reviewed the
original publications that support the pathogenicity of all of the
alleles (HGMD) and/or the evidence associating the gene with
the disorder (OMIM). At this time, all three abovementioned
categories of investigation were reported in full recognition;
some would be found to be non-disuse-producing alleles as
databases improve and functional assays complement informatics
predictions. We have updated clinical reports as these data
emerged and counseled the patients on the options for reducing or
eliminating the disease risk.
Disease Genes Identified in the Cohort. Table SI summarizes our
disease associations. Matching personal medical records to per-
sonal genome reports was informative. We elected to report
findings as disease-gene associations instead of reporting findings
as diagnostic because we did not included in our study traditional
"surrogate markers" (analytes, proteins, and imaging) for the
confirmation of a disease diagnosis. We considered potentially
causative findings to be those mutations that are predicted to be
damaging in addition to being reported in either HGMD (13, 14)
or OMIM (27, 28) databases. These mutations are considered to
be "need to know" and are reported to volunteers. There was
identification of associations for vascular disease and/or hyper-
cholesterolemia in five individuals related to LDL receptor
(LDLR) alleles. LDLR mutations are causative of early onset
autosomal dominant coronary artery disease (CAD) and manifest
hypercholesterolemia (29, 30). Three individuals were taking
statins related to their hypercholesterolemia. Two individuals
were not under care but had history of personal hypercholester-
olemia and in one case a son with hypercholesterolemia.
There were four volunteers detected with risk genes for di-
abetes mellitus (31-34). Two of the individuals were under
therapy for diabetes 2, whereas two additional volunteers had
elevated fasting blood sugars and were being followed by their
physicians for further analytes measurements. There were two
individuals with morbid obesity (body mass index of 32 and 37
kghto who carried an MC4R allele associated with pediatric
obesity and rare heterozygotic adults (35, 36). Two ophthalmo-
logic disease/gene associations were identified. The childhood
brittle corneal syndrome type 1 occurred in a volunteer who had
undergone successful corneal transplant and carried a putative
compound heterozygosity in ZNF469 (37). One volunteer was
under care for macular dystrophy and carried an ABCA4 allele
(38). One sterile male volunteer was found to have an insertion
in gene USP26 (known to be responsible for infertility in men)
(39). Associations for melanoma and breast cancer were identi-
fied. The two patients with melanoma carried different gene
allele associations: GRIN2A and BAG4 (40-42). Two volunteers
diagnosed with breast cancer had different allele associations in
BRCA2 (43, 44). Single cases of early onset prostate (LRP2) (45)
and follicular thyroid cancer (TPR) cancer were identified (46,
47). A volunteer with nonsyndromic deafness was found to have
risk alleles in two genes associated with autosomal dominant
(AD) deafness and had a three-generation positive family history
of deafness (48). In each case, the volunteer was instructed to
inform their Physician and was requested to confirm the ge-
nomic allele identification in a Clinical Laboratory Improve-
ment Amendments (CLIA)-certified laboratory, even when each
reported allele had been sequenced twice in independent studies.
The finding provided information for personal and family risk
counseling not possible before gene association.
Incorporation of Three•Generation Pedigrees into the Genetic Analyses.
The three-generation pedigree medical information was analyzed
to identify those volunteer families who warranted additional ge-
netic study. Table S2 lists those genetic disorders identified by
pedigree/familial medical history. In each case, the volunteer was
counseled for the family risk and encouraged to contact at risk
family members who may benefit from focused genetic studies.
Three of the families have reported that they have had their fa-
milial genetic diagnosis resolved at this time paraganglioma (49),
Prader-Willi syndrome (50, 51), and ankylosing spondylitis (AS)
(52)1. One additional family is under study rourette syndrome
(534 Additional familial disease risks were identified by history
for atrial fibrillation (AR), bicuspid aortic valve (BAV), dyslexia
(AR), Fatny's (XLR), gall stones (AD), and myotonic dystrophy
(anticipation AD). Success with this approach was productive but
not universally accepted because disease/gene resolution requires
interaction with interested and motivated family members.
WWICall Rant NM 1
dOSNP 132
Sam Ms
CGI var.annosadon Fie
,0. Gam retSess brown Gene OK%)
mpg fl LANNCNAR (na-toding vatianta)
4 Stank:IKIND Db foe (Mum, Cause* Mutations
Aker Out variants MM >r 1%
n
Should have been convect damaging for
menu 2/3 ptecutions tools
.11
. frolyphen-2. Sift and hivtatronTaster1
Sternalfrequencyfilter < 3%
YPOVarlards Won
Fig. 2. Pipeline to generate variants reports. Every variant in the variant call
format file is annotated using spnEff and ANNOVAR; nonsynonymous cod-
ing variants are annotated using the commercial version of the HGMD da-
tabase. (Left) Our selection of variants by the creation of a personalized
candidate gene list using medical history and family history for each vol-
unteer. Mutations with a minor allele frequency of >1% are removed using
frequencies from the NHLSI exome sequencing project (ESP), 1,000 Genomes
Project. Variants that are consider benign by two of three predictions tools
are removed (using dbNSFP). Finally, we remove variants that are present in
our cohort more than three times.
2 of 6 I www.pnas.orgfcgikloW10.10734mas.1315934110
Gonzalez-Garay et al.
EFTA01140243
81 volunteers
Using HGMD
(109,708 annotated
variants)
65,582 NSCV NSC-snps Exon Sequencing
1,036 NSC-sts from HGMD
A
275 NSC-snps from HGMD after filtering
160 NSC•snps from OMIM
Medical and family History Interpretation
Medical History
B
23 disease-gene
associations
Family History
B
4 resolved
1 In progress
ck.1
Negative History
206 HGMD Autosomal recessive (169 Genes)
63 MAIM (No.HGMD)Autosemal recessive (63 Genes)
3 HGMD X linked recessive (3 Genes)
6 OMIM (No-HGMD) X linked recessive (6 Genes)
64 HGMD Autosomal Dominant (44 Genes)
62 °Mill (tio.HGIND)AulosoM31 Dominant (62 Genes)
Fig. 3. Summary of result. The flowchart provides the number of variants from each step of the pipeline described in Fig. 2.
Table S3 provides a sampling of the recessive risk alleles. They
constitute the majority of the observed alleles. Of the 160 off-
spring of the 81 volunteers, no children were affected with these
disorders. MI volunteers indicated their families were complete,
and thus, no spousal genetic studies were recommended, but
information was proposed to be provided to reproductive age
descendants. Many of the genes identified are pan of prenatal
carrier screens and/or newborn state-sponsored screening pro-
grams [phenylketonuria, maple syrup urine disease, cystic fibro-
sis, Niemann-Pick disease, Gaucher disease, factor V Leiden
thrombophilia, medium-chain acyl-CoA dehydrogenase (MCAD)
deficiency]. Undoubtedly, NGS will expand the number of non-
unreported disease alleles and scope of genes studied for couples
in the pregnancy setting. The Beyond Batten Disease Foundation
of Austin, TX (54), has this goal.
Table S4 shows that a category of high concern was the
identification of XLR disease risk alleles among our female vol-
unteers. One volunteer had an affected son (isolated case) with
Fabry disease that was diagnosed before our study. There were
four disease alleles identified, each listed in HGMD (13, 14).
There was no family history of these disorders found in the three-
generation pedigree of each. MI were counseled to have their test
confirmed and daughters studied in a CLIA-certified laboratory
given the high disease risk (50% for men). Three men in our study
had alleles predicted from the OMIM (27, 28) disease database to
be causative for cutis laxus, Duchenne muscular dystrophy, con-
genital nystagmus, and hemophilia A, illustrating the challenge of
predicting damaging mutations bioinformatically. None had the
disorders. Counseling and family study were individualized for
each disease risk. Volunteers were made aware of database errors
in the reports.
Tables S5-510 provide a third category that is very problem-
atic, the AD group. The allele identification is as previously
described, but counseling is more difficult because of variation in
severity and time onset. For this age group of volunteers, the
interest was high because disease prevention was frequently
expressed as a goal in the face-to-face counseling meetings. A
poststudy survey also reflected this objective. We focused in this
paper on the three major causes of death in the United States:
cancer, cardiovascular disease, and neurodegenerative disease.
In our analysis of each volunteer, we reviewed the genomic and
family data.
Table S5 lists the breast cancer risk results. There were 12
volunteers found to have breast cancer risk alleles of genes
BRCAI, BRCAZ PALB2, R4D5IC, and RADS& Two volunteers
with BRCA2 risk alleles were diagnosed with breast cancer. One
man carried a premature chain termination mutation and has
a first-degree relative with breast cancer (50s). A third volunteer
had a frame shift mutation (high-risk allele) but not found to
have breast cancer. All alleles were predicted to be damaging.
Eight volunteers had first-degree relatives with breast cancer,
whereas four had a negative family history of disease. All were
advised to seek confirmation via a CLIA-cenified laboratory.
One patient with an HGMD (13, 14) allele was confirmed but
predicted to be "neutral" by a commercial laboratory. All were
counseled regarding the need for regular mammograms and
gynecological examinations and were requested to inform their
physician of this research risk allele identification.
Table S6 displays the colon cancer alleles. There was no disease
incidence of colon cancer in this group with the exception of one
volunteer with a positive dysplastic polyp biopsy. Five volunteers
had a positive family history of colon cancer. Five volunteers had
no family history of disease. All were advised to obtain confir-
matory CLIA-certified laboratory diagnosis and advise their phy-
sician of the research allele identification. Of the 10 volunteers,
many had undergone colonoscopy as pan of their health care.
Table S7 includes all of the remaining type of cancers. Two
volunteers diagnosed with melanomas were found to have dif-
ferent disease gene risk alleles. We identified 10 volunteers with
prostate risk alleles. One volunteer reported a diagnosis of
prostate cancer at age 55 while the other nine volunteers
reported no familial history of the disease. Genetic counseling
for cancer risk required the greatest counseling time. The con-
cepts of the two-hit hypothesis (55) and "somatic mutations"
(56) were difficult to grasp for the volunteers, even when we
discussed the subject in great detail during the education session.
All volunteers were provided information regarding standard of
practice approaches for early detection of the respective cancer.
Table S8 lists all of the affected volunteers with cardiomyop-
athies (57). Five volunteers had a medical history of cardiac
dysrhythmia with identified risk alleles. One younger (50s) vol-
unteer had first-degree relatives requiring pacemakers and car-
ried two risk alleles. Three volunteers had either stent placements
or bypass procedures related to CAD. Each was in their 70s.
Table S9 lists the 11 volunteers who had no apparent disease
but had a positive family history of tachycardia, sudden death,
and CAD and carried risk alleles. We provide this experience to
broaden alertness to both genetic causation and risk of disease
Gonzalez-Garry et al.
PNOS tarty Edition I 3 of 6
EFTA01140244
for adult-onset cardiovascular disease (58). Of the alleles listed
in Tables SE and S9, 13 alleles were found in HGMD (13, 14).
We advised volunteers to inform their physicians of these results
for their long-term clinical care.
In Table SI0, we listed the results for adult-onset neurodegen-
erative diseases. Our findings were limited but of high interest to the
cohort. It was frequently asked by volunteers if they had Alz-
heimer's risk. We summarize our findings for Alzheimer's and
Parkinson risk alleles (59, 60). The genes included APOE, APP,
PSENI, MAPT, El F461, GBA, GIGYF2, LRRIC.2, PARIC2, PM20DI,
and SNCA. There were nine volunteers with HGMD (13, 14) listed
risk alleles. Of these, two had a positive family history of Parkinson
disease and one with Alzheimer's disease. One of the PARK2 alleles
occurred in a volunteer who provided a history of three second-
degree relatives in a sibship affected with disease. The reminder had
no family history of either disease. There were 25 alleles predicted
to be damaging. One is a frameshift allele. None of these volunteers
had a family history of disease.
Discussion
Exome Sequendng Is Limited. The full spectrum of disease muta-
tion identification is not satisfied by exome sequencing alone
because large deletions, copy number variations (CNVs), and
triplet repeats are not reliably identified at this time. Further-
more, exon capture relies on probe design. For example, the
discovery of the MAGEL2 mutation in our Prader-Willi patient
was made using whole genome sequencing (WGS) from com-
plete genomics and missed by exome capture because of high GC
content (51). The accuracy of coding allele identifications was.
however, quite high and thus of great utility as a genome
screening approach. CGI (61) sequencing produced higher cov-
erage than exome sequencing data for CNV, large deletions,
and regulatory elements will have utility as we analyze previously
labeled "junk" DNA for disease causation (62). There is also the
issue of our limited knowledge of disease alleles within the
databases. One of our biggest challenges for the interpretation of
human genomes is the lack of gene annotations and the errors in
databases. Our knowledge base for human disorders is small.
There are only —100,000 pathogenic variants in the HGMD (13,
14) database and a fraction of them have errors. If we do not use
annotated variants but instead gene annotations as our source of
information, we can calculate the fraction of knowledge that we
can use at this time. For example, the number of genes associ-
ated with human disorders reported by HGMD (13, 14), OMIM
(27, 28), UniProtICB (63), Gene Atlas (64), etc. is 4,622. From
the 4,622 genes, only 1,955 genes have high-quality data because
they are part of the GeneTest (65) database. GeneTest (65) is
a database originally created by the National Center for Bio-
technology Information to track all of the laboratories worldwide
that offer a genetic test for a gene. With this information, we
know that the fraction of genes that we can use for the in-
terpretation of a human genome of a successful high-quality
whole exome or whole genome dataset is -7-18% when using
the high confidence set of 1,955 genes or a set of 4,622 genes.
Despite these limitations, this report documents the utility for
disease associations and risk.
During the last few years, the field of NOS has developed
a large number of tools that make it easier to handle the analysis
of reads, variant calling, functional prediction, and annotation
(66). There are also large publicly available datasets of healthy
individuals that can be used as controls that can be used to
remove technology specific errors or filter out common poly-
morphisms. As we begin to use whole genome sequencing at an
increasing depth, we are discovering more variants, so these
public datasets are becoming increasingly important for quality
control and filtering of variants in smaller projects. One of the
main limitations is the lack of access to public and private ge-
nome and exome variants. There are thousands of datasets, but
the majority are inaccessible to the scientific community. We
recognize the existence of the 1,000 Genomes project, the
NHLBI Exome Sequencing Project (ESP), Exome variant server,
and the 69 sets of whole genomes from CGI (15-17, 67). How-
ever, we need larger datasets from very carefully phenotyped
patients to assist in the interpretation of the variants in our
patients. The million genome project of the US Department of
Veterans Affairs (68) has the potential to provide such data, as
well as private health plans considering adaptation of genome
sequencing.
Genetic Discoveries Provided to Volunteers. There are several
approaches to disclose the results to volunteers. Groups like
Patel et al. use the statistics and epidemiology approach in
reporting the polygenic risk assessment using common SNPs that
have been previous associated with genetic disorders from ge-
nome-wide association studies (69). The PGP-10 project uses an
automated tool or Genome Environment Trait Evidence (GET-
Evidence) system, with is a system that is collaboratively edited
(70). For this project, we decided to focus on reporting only high-
quality variants that are rare in the population and considered
damaging by two of three commonly used predictions algorithms.
In addition, the variant has to be either reported in HGMD
under category DM or the gene has to have been previous
associated with a genetic disorder (OMIM). The group of vol-
unteers consisted of adults with complete medical and family
history so we personalized the reports as described in Fig. 2 to
specifically try to identify molecular explanations for the mal-
adies reported in their medical or family history. This approach
generated reports that were easy to explain and accepted by the
patients during the genetic counseling session.
Medical Histories and Family Pedigrees Complement Sequencing
Resift. The utility of genome data was significantly enhanced
when integrating standard medical care features of personal and
family disease diagnosis. The significant number of 23 disease
associations in all likelihood represents a bias of our volunteers
to seek answers to their personal disease history. This observa-
tion may hold a key to how we obtain maximal use of genome
sequencing--sequence the disease index cases. Our experience
would suggest a high value for that utilization. This approach has
been clearly documented to be successful for pediatric genetic
disorders but not exploited for adult-onset disease. The practical
value of this study is summarized in Tables SI and S2 and fell
into two general categories: (i) new knowledge of the genetic risk
and heritability for themselves and family; and (ii) options for
therapy (CAD) or imaging (cancer) for personal and extended
family care. By using the medical and family history, we were
able to clarify the genetic risk in 6 of the 81 cases. One of the
cases yielded a new discovery of a gene associated with Prader-
Willi syndrome. which is described in another paper (51).
Prenatal vs. Adult Genetic Screening. The technology and this report
beg the question of whether we are prepared to offer adult disease
risk screening. Currently, prenatal and newborn screening for
a selected set of frequently occurring disease alleles (not genome
sequencing) is a standard of practice. There are questions that
deserve medical and ethical review before adult screening
becomes a standard of practice. First, for reproductive and new-
born diagnosis, typically only actionable childhood diseases are
explored, which respects the future autonomy of the child and
preserves her right to an open future (71, 72). Because adult
screening decisions would be made by an autonomous individual
for her own health decisions, broader conceptions of utility, in-
cluding personal utility, need to be considered (73). It is a clear
and simple decision to provide patients with actionable genetic
information from a WES study; on the other hand, it is challenging
and it raises a difficult ethical question to decide what to do with
incidental genetic findings that are not actionable and could lead
to physiological distress to the patient (e.g. APO-E for Alzheimer
dictate). Despite this ethical dilemma our group of volunteers
elected to receive information even if the genetic information
might not be actionable. Only 3% of the volunteers were uncertain
about receiving nonactionable information (SI Pausnuly Survey).
4 of 6 I www.pnas.orglegildoi/10.10734wias.13I5934110
Gotualez-Garay et al.
EFTA01140245
Volunteer Response to Clinical Reports. From our poststudy survey,
we found that 72% of the responders reported speaking with
their physician about their results. This raises important ques-
tions about whether nongeneticists are adequately prepared to
counsel patients based on WES results and whether such follow-
up will lead to iatrogenic harm or unjustified use of health care
resources (74). Twenty-five percent reported changing their
behaviors because of the results, which is surprising given that
previous reports found no significant behavior change resulting
from adult risk screening in a direct-to-consumer setting (75).
Despite that all of the participants were clearly informed that
their results originated from two independent sequencing experi-
ments and that we advised them to have their results clinically
validated in a CLIA-certified laboratory, 78% reported that they
did not have the results confirmed. This low percentage of
confirmatory results from the volunteers raises the question of
whether it is sufficient to counsel research participants to have
results clinically confirmed or if investigators should be required
to confirm results before disclosure.
It was apparent for some volunteers that they were seeking
information related to familial diseases. Resolution of these
questions required family member interest and motivation be-
cause, in all cases, we had sequenced the nonrisk family mem-
ber. We followed up each case with a referral to a qualified
genetics program with diagnostic capacity for the suspected
genetic disease.
Our efforts to analyze cancer, cardiovascular, neurodegener-
ative, and obesity/diabetes risk were successful but needed con-
siderable education/counseling to avoid confusion over risk vs.
diagnosis. Second, there are standard of care options for those
with risk alleles for cancer, cardiovascular disease, and diabetes
for disease modification or early diagnosis. 'Thus, sequencing
serves as a new screening risk detection approach toward the
objective of improved health. It is expected that genomic studies
will increase surveillance studies (e.g., colonoscopy. gynecologic
examinations, mammograms, cardiovascular markers and scan-
ning studies) but has the possibility of more precisely identifying
the patients who may benefit from rlititsce prevention surveillance.
The area of adult-onset neurologic disorders is an increasing
concern worldwide as our population ages, thus exposing disease
incidence not seen earlier. The genetic disease discoveries are
limited. Confirmatory diagnostics such as image analysis and
biomarkers/surrogate markers are just emerging, and prevention
therapeutic options are nonexistent. Although one might ques-
tion the utility of screening for these disorders at this time, the
experience with Huntington disease (76) screening taught valu-
able lessons on how to proceed with studying and counseling
families at risk. Furthermore, there are new therapeutic trials in
disease prevention for Alzheimer's (58) and Parkinson disease
based on the genetic cause of disease. These clinical trials use
genetic diagnosis to select participants, which is also a successful
approach in cancer drug development (77-79).
Barriers to the Adoption of Genetic Screening via Sequendng. Al-
though the above comments would present the case for the value of
adult genetic screening via whole genome sequencing, there are
major issues to be addressed. In our opinion, the least is sequencing
1. Lew S. et al. (2007) The diploid genome sequence of an individual human. PLoS Riot
3(10):4254.
2. Bamshad Mi, et aL (2011) Excaie sequencing as a tool for Mendelian disease gene
discovery. Nat Rev Genet 12(1 1):74S-7SS.
3. Tabor 14K, Berkman BE. Hull 5C. aamShad Ml (2011) GenanKs really gets personal:
How exome and whole genome sequencing challenge the ethical framework of hu-
man genetics research. Am Med Genet A 1SSA(12):2916-2924.
4. Lander ES R011)Genomesequeuingannhersary. The accelerator. Scknce 331(6020):
1024.
S. Lander ES 0011) Initial impact of the sequencing of the human genome. Nature
470(7333):187-197.
6. Biesedser LC, Burke W, Kahane I, Non SE, limn ern R (2012) Next.generation se.
quencing in the clinic Are we ready? Nat Rev Genet 13(11)1318424.
7. Hennekam Rc, Biese<ker LG (2012) Next-generation sequencing demands next-gen-
eration phenotypIng. Men Muth 33(5)1384-886.
technology and cost. Bioinformatics focused on the practical ex-
traction of medical relevant/actionable data are a challenge. We
relied heavily on HGMD alleles for "need to know" information
to patients. This approach is flawed in three ways: (i) databases
contain errors; (ii) highly validated disease databases are scattered,
private, and limited; and (iii) the future will provide more disease
risk alleles by sequencing than by patient reports in the literature.
Our current limitation for interpretation of a genome is not the
quality of the data of the coverage of the genome but our disease
knowledge database. R. Cotton's Human Variome Project (62)
together with Beijing Genome Institute are proposing to create
a highly validated disease allele database.
New technological advances such as structure-based pre-
diction of protein-protein interactions on a genome wide scale
(80), 3D structure of protein active and contact sites (SI), high-
throughput functional assays of damaging alleles (81-83), and
new approaches that combine analytes, metabolomics and ge-
netic information from a single individual (84) are just a few
examples of the new technologies that will help us to generate
better interpretation of genomic data.
The delivery of the genome risk information will need to be
carried out by a new cadre of physicians and counselors skilled in
medicine, genetics, and education/counseling. These experts will
need to integrate into medical care as well as has been done for
newborn screening, prenatal diagnosis, and newborn genetic
disease diagnosis.
The approach of adult screening is in its early phase but from
our data appears very promising. We conclude that the genomic
study of adults deserves intensified effort to determine if "need
to know" genome information has the utility for improved
quality of health for our aging population.
Materials and Methods
The oversight of this research was under two institutional review boards: (i)
HSC-IMM-08-0641 (University of Texas Health Science Center at Houston)
and (ii) H-30710 (Baylor College of Medicine).
Cohort Description. The cohort consists of members and spouses in the
Houston Chapter of the Young President Organization (YPO) (85). Theentire
description of the cohort can be found in SI Materials and Methods.
MS Sequencing. Standard NGS was performed using illumine HighSeq; an
extended explanation can be found in
Materials and Methods.
Sequencing Analysis. Fig. 1 illustrates OUf pipeline, and fig. 2 describes our
pipeline to detect known pathogenic variations. Additional details can be
found in Sf Materials and Methods.
Counseing. Genome counseling was conducted by a board-certified internist
and medical geneticist by both individual meetings and two written sum-
maries over a period of 12 mo. Additional information can be found in SI
Materials and Methods.
ACKNOWLEDGMENTS. This work was supported by the Cullen Foundation
for Higher Education and the Governing Board of the Greater Houston
Community Foundation. The funding organizations made the awards to the
University of Texas Health Science Center at Houston and Baylor College of
Medicine. C.T.C. was the principal investigator of both grants.
8. Anonymous Finding of rare disease genes in Canada (forge Canada). Available at
http/Avenv.genomebccaipartfolia/projects/health.projecb/finding.of.raredisease.
genevincanada.forge-canada/. Accessed September 19,2013.
9. Gehl WA, et al. (2012) The National Institutes of Health 8a-diagnosed diseases pro-
gram: Insights into rare diseases. Genet Med ta(tkm-59.
10. Gant WA et al. 12012) The !Catena! Institutes of Health Lnoiegnesect diseases pro-
gram: Insights Into rare diseases Genet Med 14(1)51-59.
11. Gehl WA lifft 0 (2011) The NIH undiagnosed diseases program: Lessons learned.
/AMA 305(I8):1904
-I905.
12. Koenekoop RK. et al; Finding of Rare Disease Genes (FORGE) Canada Consortium
(2012) Mutations in NMNAT1 MAO Leber congenital amaurosis and identify a new
disease pathway for retinal degeneration. Nat Genet 44(9):1035-1039.
13. Stetson PD. et al. (2012) The Human Gene Mutation Database (IMMO) and Its ex-
ploitation in the fields of personalized genomlcs and molecular evolution. Curr Pro-
tocol erolnlorm 39:1.13.1-1.1320.
Genzakz-Gairay et al.
PNAS Early Edition I 5 of 6
EFTA01140246
14. Stenson PD, et al. (2009) The Human Gene Mutation Database: 2008 update. Genome
Med 1(1)13.
IS. Anonymous NHLBI exome sequencing project (ESP)exane variant server. Available at
http:Nevsgswashington.edteEVSL Accessed September 19, 2013.
16. Oarke L Zheng-Bradley X. et at 12012) The 1800 Genomes Project: Data management
and canmunity access. Nat Methods 9(5)459-462.
17. Abecasb GR. et al; 1000 Genomes Protect Consortium (2010) A map of human ge-
nome variation f ran poptiation-scale sequencing. Nature 4670319):1061-1073.
ILL Adzhubei La, et al. (2010) A method and server for predicting damaging missense
mutations. Nat Methods 7(41:248-249.
19. Kumar P, Henikoff S, Ng PC (2009) Predicting the effects of coding nonsynonymous
variants on protein function using the SIFT algorithm. Nat Probst 40)1073-1081.
20. Slm NL. Kumar P. et al (2012) SIFT web server: Predicting effects of amino acid sub-
stitutions on proteins. Nucleic Acids Re, 40(Web Saver issuckYV4S2-W457.
21. Hu 1. Ng PC 8012) Predicting the effects of frameshdling lads. Genuine BIN 1342)119.
22. Ng PC Henatoff S (2001) Predicting deleterious amino acid substitutions. Gnome Re,
11(5)1163-874.
23. Ng PC Henikoff S 0003) 5SF: Predicting amino acid changes that affect protein
function. Nucleic Acids Re, 31(13):3812-3814.
24. Ng PC. Henikoff 5 (2006) Predicting the effects of amino acid substitutions on protein
function. Anne Rev Genomics Num Genet 7:61-80.
25. Schwarz 109, Rodelsperger C Schuelke NI, Seelow LI (2010) MutationTaster evaluates
thseasecausMg potential of sequence alterations. Nat Methods 7181:575-576.
26. Liu X. Nan X. Boer-winkle E (2011) dbNSFP: a lightweight database of human non-
synonymous SNPs and their functional predictions. Man Mutat 32(8)490499.
27. Anonymous Online Mendelian Inheritance in man 0M61. Available at httpllornimorg
Accessed September 19,2013.
21. Anonymous NCBI OMIM Online Mendelian Inheritance in Man. Available at httpli
www.ncbLnlanih.govlornim. Accessed September 19. 2013.
29. Huijgen K Kindt I, Defesche 1C, Kastelein II (2012) Cardiovascular risk in relation to
functionality of sequence variants in the gene coding for the low-density koprcrtein
receptor: A study among 29.365 iedwolva tested for 64 specific low-density lipo-
protein-receptor sequence variants. Cur Heart 133(181:2325-2330.
30. Boekhoktt 5M. et al. (2012) ASSOciateell of LDt cholesterol, non.HDL cholesterol and
aPoliPoprotein B levels with risk of cardiovascular events among patients treated
with statins: A meta-analysis. JAIAA 307(12k1302-1309.
31. Waeber G. et al. (2000) The gene MAPKINPI. encoding islet.bran-I, is a candidate for
type 2 diabetes. Nat Genet 24(3)291-295.
32. Mosta L el al. (2011) Genetic variability of the fructosamme 3-kinase gene in diabetic
patients. CM Chem Lab Med 41(5):803-808.
33. da Silva Xavier G, et al. (2011) Per-arntsim (PM) domaM-containing protein kinase is
downregulated In human Islets in type 2 diabetes rid regulates gluCagOn secretion.
Diabetobgia 54(4)219-827.
34. MacDonald PE, Rottman P (2011) Per-amt.sim (PAS) domain kinase (PAW as a reg.
uLatOr of glucagon secretion. Diabetologia 54(4):719-721.
35. Oltahilly S (2009) Human genetics ilurninates the paths to metabolic disease. Nature
462(7271)307-314.
36. van did Berg L et al. 12011) Melanocordn-4 receptor gene mutations In a Dutch
cohort of obese children. Obesity (Silver Spring) 19(3)400-611.
37. Al-Owain M. A1.Doseri MS. Sunker A. Shuaib T. Alkuraya FS (2012) Identification of
a novel ZNF469 mutation in a large family wit' s Ehlen.Danlos phenotype. Gene
S11(2k497-430.
38. Fritsch. LG, et al, (2012) A subgroup of age-related macular degeneration Is emaci-
ated with mono-allelic sequence variants in the ABCAO gene. Invest Ophthalmol Vin
Sal 53(4):2112-2118.
39. %hang 1, et al. (2012) IPOtyrnerphism of Usp26 correlates with Idiopathic male In-
fertaityl. Ihonghua Nan Ke Xue 18(2)10S-10B.
40. Wel X. et al.; MSC Comparative Sequencing Program (2011) Ellen* sequencing
identifies GRIN2A as frequently mutated in melanoma. Nat Genet d3(5)A42-446.
91. Howell PM, Jr. Li X, Riker AI, )G Y (2010) MicroRNA in melanoma. °droner J 10(2k
83-92.
92. Xi V, et al. (2008) Global comparative gene expression analysis of melanoma patient
samples. derived <es lines and corresponding turner xenografts. Canter Genomics
Proteomks 50):1-35.
Q. Nelson HO, Huffman LH, Fu R, Harris EL; U.S. Preventive Services Task Force (2005)
Genetic risk assessment and BRCA mutation testing for breast and ovarian cancer
susceptibiky: Systematic evidence review for the V.S. Preventive Services Task Force.
Ann intern Med 143(5):362-379.
44. Anonymous National Cancer Institute BRCA1 and BRCAZ. Available at httplAwm.
cancer.govkancertopiatfactsheet/RiskttIRCA. Accessed September 19,2013.
45. Holt SIC. et al. (2008) ASSO0atiOn of megalin genetic polymorphism with prostate
cancer risk and prognosis. CM Cancer ReS 14(12):3823-3831.
96. Frank.Raue K, et al. (2013) Prevalence and clinical spectrum of nonsecretoni medul-
lary thyroid carcinoma In a series of 839 patients with sporadic medullary thyrOld
carcinoma. Thyroid 23(3):294-300.
97. Mak HH, et aL (2007)Oncogenic activation of the Met receptor tyrosine kinase fusion
protein, Ter-Met. Involves exclusion from the endocytic degradative pathway. On-
cogene 26(51k7213-7221.
M. Ruel Let al. (2008) Impairment of SLC17A8 encoding vesicular glutamate transporter.
3, VGLUT3, underlies nOnSyndrOmk deafness DFNA2S and inner hair cell dysfunction
in null mice. Am .1 Hum Genet 83(2):278-292.
49. van Hulstelp LT, Dekkers OM, Mn Fl. Smlt 1W, Calmat EP 0012) Risk of malignant
paraganglioma 1n 9211B-mutation and 50410mtnatiOn canals A systematic review
and meta-analysis./ Med Genet 49(12):768-776.
50. Pang Y, Tsal TF, Bressler J. Beaudet AL 11998) Imprinting in Angelman and Prader-
Willi syndromes. Cuss Opin Genet On B(3):334-342.
SI. Schaaf CP, et al. (2013) Truncating mutations of MAGEL2 cause autism and erader-
Willi syndrome (PWS) or PWS.like phenotypes. Nat Genet. In press.
52. Rashid T, (bringer A (2011) Gut-mediated and MLA-827-assoriated arthritis: An em-
phasis on ankylosing spondylitis and CrohNs disease with a proposal for the use of
new treatment. DiSCOY hied 12(64):187-194.
53. Deng H, Gao IC, lankovic 1 (2012) The genetics of Tourette syndrome. Nat Rev Neural
80)203-213.
54. Anonymous Beyond Batten Disease Foundation. Available at httrabeyonSatten.
orgy. Accessed September 19,2013.
55. Knudson AG (1996) Hereditary cancer: Two hits revisited.
Cancer ReS Cen Onttif
122(3):135-140.
56. Milt-Zaino, S. et al; Breast Cancer Working Group of the International Cancer Genome
Consortium (2012) The life history of 21 breast cancers. CeN 149(5)394-1007.
57. Alcalai R, Seidman /G, Seidman CE (2008) Genetic bash of hypertrophic cardiony
apathy from bench to the clinics. / Carthovasc EintrOphydol 1901:104-110.
58. Rader 01. Cohen 1. Hobbs NH (2003) Monogenk hypercholesterolemla New insights
in pathogenesis and treatment. Gin Invert 111(12)179S-1801.
59. Martin I. Dawson VL Dawson TM 12011) Recent advances In the genetics of Parkin-
son's disease. Anna Rev Genomics Mum Genet 12:301-325.
60. Selkoe D1 (2012) Preventing Alzheirner's disease. Science 337(6100:1488-1492.
61. Anonymous Complete Genomics Inc. Available at Mbyfernwr.ownpletegenomks.
can. Accessed September 19,2013.
62. Anonymous Human varlome project. Available at httplFwenv)umanvarlomeprOjea.
Org. Accessed September 19. 2013.
63. Anonymous UniProtKB. Available at http://wnw.uniprotorgtuniprot. Accessed
September 19.2013.
64. Anonymous Gene atlas. Available at hnp:Nmws.geneatias.orgIgenelmain.jsp. At-
temod September 19, 2013.
65. Anonymous Ger** TeStIlla Registry (GeneTesis). Available at http/Ave.w.geneteStS.
org. Accessed September 19,2011
66. Anonymous stganswers. Available at httDINSeganswert con. Accessed September 19.
2013.
67. Anonymous 69 genornes data. Ausilable at httpininwtcornpletegenomicscorn/public-
datae69-Genoinest Accessed September 19. 2013.
68. Anonymous The million veteran program. Available at http:Nvnwr.va.gmstopcsipresteir
pressrekrze.chraid-2090. Accessed September 19,2013.
69. Patel C.), et at. 12013) Whole genome sequencing In support of wellness and health
maintenance. Gramme Med 5(6):58.
70. Ball MP, et at. 0012) A public resource facilitating clinical use of genomes. fl oc Nate
Aced 56 LISA 109(30)11920-11927.
71. American Academy of Pediatrics Committee on Bioethics (2001) Ethical issues with
genetic testing In pediatrics. Pediatrics W7(61:1451-1455.
72. Oasis OS (1997) Genetic dilemmas and the child's right to an open future. Hastings
Cent Rep 27(2):7-15.
73. Wolf SM, Lawrenz W. et at (2008) Managing Incidental findings In human subjects
research: Analysis and recommendations./ Law Med Ethic 36(2)219-24B.
79. McGuire At. Burke W (ZOOM An unwelcome side effect of direCt4O-COMumer per-
sonal genome testing: Raiding the medical commons. /AMA 300(22):2669-2671.
75. Blois CS, Scheele N), Topol 61 (2011) Effect of direct-to-consumer gencenewide pro-
filing to assess disease risk. N Enloe I Med 364(6):524-534.
76. Wexler NS (2012) Huntington's disease: Advocacy driving science. Annu Rev Med 63:
1-22.
77. Caskey CT (2007) The drug develeprnent crisis: Efficiency and safety. AMIN Rev Med
5a,1-16
78. Casket, CT (2010) Using genetic diagnosis to determine Individual therapeutic utility.
Annu Rev Med 61:1-15.
79. Miller G (2012) Alzheimer's research. Stopping Alzheimer's before it starts. Science
337(6096):790-792.
80. Mang GC et al. (2012) Structure-based prediction of protein-protein interactions on
a genome.wicte scale. Nature 490(7421):556-560.
81. Edwards AM. BounVa C Kerr DJ, Wilhon TM (2009) Open access chemical and algal
probes to support drug discovery. Nat Chem Rio! 50):436-490.
82. Maroon( MT. Jarvis BM. Donnelly-Roberts D (2012) High throughput functional assays
for P2X receptors. Cliff Protocol Phannaca lumChapter 9:Unit 9.15.
83. Trivedi 5, Liu /, Liu R. Bostwick R (2010) Advances in functional assays for high.
thrOughput greening of ion thannelstargets Expert Opal Ono) Gismo 5(I 0)1995-I C06.
89. Suhre K. et at; CARDloGRAM (2011) Human metabolic individuality in biomedical and
pharmaceutical research. Nature 477(7362):54-60.
85. AnCelyMOW Membership criteria YPO. Available at httinivnwrypo.orgdoin.ypor.
Accessed September 19,2013.
GM 6 I www.priaS.OrgfCgildOi/10.1073/13ries.1315939110
Gonzalez-Garay et al.
EFTA01140247
Supporting Information
Gonzalez-Garay et al. 10.1073/pnas.1315934110
SI Materials and Methods
Cohort Description.
cohort consists of members and spouses in
the Houston Chapter of the Young Presidents Organization
(YPO). Criteria for membership into the YPO includes corporate
and community leadership (1). This cohort is well educated and
of higher socioeconomic status. All 450 YPO members were
invited to attend an 8-h educational program incorporating
technology, human genetics, anticipated outcomes, ethical con-
siderations, discussion groups, and technology demonstrations
and printed materials. Of the 150 attendees, 81 volunteered to
participate in this study: 46 men and 35 women, with an average
age of 54 y. All 81 elected under the terms of the University of
Texas Health Science Center at Houston's institutional review
board to receive "need to know" genomic disease risk results.
Each volunteer provided a detailed medical and drug use history
reviewed by our physician-researcher (C.T.C.). A three-genera-
tion medical pedigree was acquired on each volunteer. One
volunteer could provide no family history.
Whole exome sequencing (WES) Sequendng. Genomic DNA was
extracted using a UNA kit (Promega wizard genomic DNA puri-
fication kit) following Promega's instructions (2). The cohort was
sequenced twice: the first whole exome sequencing experiment
(2011) was performed using Illumina's HiSeq and the Genome
Analyzer Hz system (3) after enrichment with Nimblegen V2 kit
(44 Mb) (4) (outsourced to the national center for genome re-
sources). Our second WES experiment (2013) was performed us-
ing Illumines newest machines HiSEq. 2500 (3) after enrichment
with Agilent SureSelect target enrichment V5+UTRs (targeting
coding regions plus UTRs) (5) (outsourced to Axeq Technologies).
Genome sequencing of a small subset (24 subjects) for validation
purposes was carried out by Complete Genomics Inc. (CGI) (6).
Sequendng Analysis. Our analysis pipeline consists of Novoalign
(7), Samtools (8), Picard (9), and The Genome Analysis Toolkit
(GATK) (10), followed by variant annotation (11-14) using
multiple databases from the University of California Santa Cruz
(UCSC) Genome bioinformatics site (15). Fig. 1 illustrates our
pipeline. Fig. 2 describes our pipeline to detect known patho-
genic variations. We detected known variants associated with
human diseases using the Human Genome Mutation Database
(HGMD) database from Biobase (16, 17) and genes known to be
associated with human disorders from Online Mendelian In-
heritance in Man (OMIM) (18, 19) and GeneTests (20). Func-
tional effects of each nonsynonymous coding variant were
evaluated using three different functional prediction algorithms
[Polyphen 2.0 (21), Sift (n-r), and MutationTaster (28)] using
the Database of Human Non-synonymous SNVs and their func-
tional predictions and annotations (dbNSFP) (29). Filtration of
common polymorphisms was accomplished using frequencies from
the National Heart. Lung, and Blood Institute (NHLBI) exome
sequencing project (ESP) (30), 1,000 Genomes (31, 32), and in-
ternally by removing any variant that appeared more than three
times in our cohort. In addition, a group of candidate genes was
obtained from OMIM (18, 19) for each volunteer after a careful
analysis of the family and personal health history of each volunteer.
Variations in those OMIM (18, 19) candidate genes were identified
and submitted to the same frequency and functional effects filter as
described before.
Variant Validation. Every variant identified in our pipeline was
evaluated for quality control, and the variant's read alignments in
the BAM file [Binary version of a SAM (Sequencing Alignment
Map) file] file were visualized using Integrative Genomics
Viewer (IGV) (33). The purpose of this step was to try to remove
the remaining false positives.
Each genetic variant was validated using the following steps: (i)
retrieve reads over variant sites for each individual; (ii) make
SamTools (8) genotype calls (an alternate calling algorithm);
(iii) retrieve quality scores for all reads; (iv) keep track of the
directional depth and require at least two variant reads in the 5'
and 3' orientation for a variant to be considered true; and (v)
filter out variants if the SamTools (8) genotype call disagrees
with the GATK (10) call or if the quality scores or directional
depth values do not exceed minimum values.
Establishing Criteria for Highly Reliable Variant Calling from Exome
Sequencing. Our first objective was to define the methods needed
to identify a set of "highly reliable- variants from the Illumine
sequencing and apply these methods to variant calling on all of
our samples. To meet our definition of a highly reliable variant,
each variant had to be detected under two independent or-
thogonal sequencing technologies and been considered as high
quality. Because there is not a common definition of what a high-
quality variant is, we decided to take advantage of the confidence
category scores provided from complete genomics; variants with
a score of VQHIGH are consider high quality (masterVarbeta
files version 2.0) and develop an equivalent value in our illumine
sequencing data. To accomplish our first objective, a dataset of
variants was generated from a set of 24 samples that we se-
quenced using Illumine (3) and an orthogonal sequencing tech-
nology (CGI) (6). CGI has their own proprietary workflow from
alignment to data annotation (34), Fig. 1 describes our analysis
workflow for exome sequencing data. Fig. S2A shows the in-
tersection between the nonsynonymous coding variants (NSCVs)
detected by CGI (6) and Illumine (3) exome sequencing. We
extracted variants from CGI with a score of VOHIGH and that
were also detected in the corresponding illumina's vcf file (Fig.
S2/3). This subset of highly reliable variants represents an aver-
age of 72% of the variants detected by CGI. By using our da-
taset, we were able to systematically test for conditions and
software setting in our pipeline that generate the majority of the
highly reliable variants and reduce the probability of selecting
variants not present in our dataset. We reached the conclusions
that by using two variant callers tools, GATK UnifiedGenotyper
and mplileup/bcftools (samtools), and selecting an overlapping
set of variants, we obtained variants of the highest quality. In
addition, a postcalling filter enforces that each variant has to
have a mapping quality >30, a base quality >20, and a coverage
>10, with at least a 3:7 ratio of variant to reference (Het) and the
presence of the variant in reads from both orientations. By using
these postcalling filters, we eliminated the majority of false-
positive calls (FP).
Counseling. Genome counseling was conducted by a board-cer-
tified internist and a medical geneticist by both individual
meetings and two written summaries over a period of 12 mo. The
summary reports were prepared and jointly endorsed by a bio-
informatician and a physician. Additional counseling was con-
ducted by phone calls and appointments with their physician as
requested by the volunteers.
Counseling of Results. Both causative and problematic alleles were
reported verbally and in two written reports over an 18-mo period.
conzeiez-oarro et al.kwm.pnas.orgicgikontentishorti1315934110
1 el 8
EFTA01140248
The first comprehensive report was updated —1 y after (i) larger
control databases downgraded some problematic alleles with
more than a 1% frequency; (ii) private consultation with disease
experts; and (iii) validation with original publications and small
disease center databases. Several new disease—gene associations
were discovered for the reported familial diseases found by
pedigree and personal medical histories. Volunteers were informed
that these were research results and instructed to consult with their
personal physician so that they could have the results validated in
a Clinical Laboratory Improvement Amendments (CLIA)-certified
laboratory. Volunteers whose family members warranted genetic
study were referred to the Baylor College of Medicine genetics
program as a medical referral because this function was outside
the institutional review board scope and Baylor College of
Medicine offered both clinical genetic and CLIA Laboratory
expertise. Our study preceded the publication of the incidental
findings guidelines in clinical WES and whole genome se-
quencing (WGS) of the American College of Medical Genetics
and Genomics (ACMG) (35). However, we have reviewed their
list of 57 genes and 24 actionable conditions, and we found that
we included all their genes in our analysis.
Poststudy Survey
We conducted an online survey to assess volunteers' experiences
of participating in this project under a Baylor College of Medi-
cine instituational review board. The survey consisted of 82 items
and focused on how the volunteers felt about taking part in the
research project, as well as their perspectives on genetic in-
formation in health care and genomic research in general. Study
participants were told the survey was completely voluntary and
that they could skip any question they preferred not to answer
and could end their participation at any time.
All 81 study volunteers were invited via e-mail to participate in
the anonymous online survey within 12 mo after receiving their
individual genome reports. Forty-two participants responded to
the online survey (response rate, 51.9%; 38 responses were
complete). Of those who responded, 59% were men, 41% were
women, and 95% had biological children. Ninety-seven percent
described their race as white, and 5% chose "other- (participants
could choose all that applied); 5% also identified themselves as
Hispanic or Latino. All participants had earned a college degree,
and 63% had completed at least some graduate work. All par-
ticipants reported having had a routine medical check-up within
the last 2 y, and when asked how they would rate their health,
58% reported excellent, 29% reported very good, 11% reported
good, and 3% reported fair.
Poststudy survey results. This study had as its objective to deliver
helpful medical genetic information. The mandatory education
program informed volunteers that unexpected risks were to be
expected. Our institutional review board required volunteers to have
the options of declining this information. None chose that option.
1. Anonymous Membership criteria YPO. Available at http:Itwvnv.yp0.0r940In-ypot
Accessed September 19, 2013.
2. Anornmote Wizard. Available at httpd%wnv.pornega.comIresources/probacelsrtedinical-
manualgONAtardlenom,r4na.purfficatiankrt.prototoV. Accessed September 19. 2013.
3. Anonymous Illumine. Available at httpdVenw.illumina.com. Accessed September 19,
2013.
4. Anonymous NemtleGet Rome. Available at htipAyntwnirnOlelenCOnOrMiuttMeetefre2/
vgandex.html.Aaessed September 19, 2013.
5. Aglent Te<Mologies Aglimt SureSelect array. Available at httpawnvgenantitS.2114M,
comientaorroSequencingSureSelect.Human.All-ExonScat740002&tabickAGPf6
1206. Accessed September 19, 2013.
6. Anonymous Complete Genomlcs mc. Available at httpAmwr.conpletegenomks,com.
Accessed September 19, 2013.
7. Novccraft.com (2012) Available at httplAwm.novocraft.com. Accessed September
19. 2013.
S. SAMtools. Available at http://samtools.sourceforge.ned. Accessed September 19,
2013.
9. Picard. Available at httpl/pkard.sourceforge.nett Accessed September 19, 2013.
The results of the anonymous online survey showed that,
overall, participants were motivated to take part in the project to
receive their genetic results and learn about their personal risk of
disease. Seventy-nine percent of respondents reported that the
opportunity to receive their personal genetic results was the most
important factor in their decision to take part in the project,
whereas another 10% cited a personal interest in genetics in
general. When asked to choose which factor was most important
in their decision to receive their personal genetic results, most
respondents (52%) reported that their interest in finding out their
personal risk for diseases was the most important factor; other
important factors included the desire to get information about
risk of health conditions for their children (17%), the desire to
learn more about the medical conditions in their family (10%),
and curiosity about their genetic makeup (10%).
Ninety-seven percent of respondents agreed or strongly agreed
that they were glad that they decided to participate in this study
and receive their personal results, leaving only 3% undecided.
Most respondents (72%) spoke with their primary care provider
about their results, and 50% reported that they spoke with other
medical professionals, including cardiologists, oncologists, and
obstetricians/gynecologists, among others; 22% reported that
they had their twice-confirmed research results confirmed in
a CLIA-cenified laboratory.
Twenty-five percent of respondents reported that the test
results motivated them to make changes to their health care (i.e.,
undergoing tests, seeing a specialist, taking vitamins or herbal
supplements), exercise, medications, or insurance (Table S11).
Respondents generally felt that researchers should offer per-
sonalized results to research participants: 54% felt that researchers
are obligated to offer results. 22% felt that researchers are obli-
gated to offer results only if the researcher is a physician, and the
remaining 24% did not think researchers were obligated to offer
results. Respondents were pleased with the methods by which they
were given their results in this study, with 95% agreeing or strongly
agreeing that they were glad the researchers sent them a person-
alized results report, and 100% agreeing or strongly agreeing that
they found the in-person consultation about their results very
helpful. When asked, 94% said they would also want an electronic
record of their entire genome if it were available.
When asked about genetic testing in health care, 83% reported
that they felt that genetic testing should be a regular part of health
care and 97% agreed or strongly agreed that they felt comfortable
using these results to make decisions about their health. Nev-
ertheless, respondents were evenly split when asked if they
thought these results should be part of their medical record.
In summary, our poststudy surveys indicated that volunteers
were motivated to gain personal and family health knowledge,
satisfied with the translation of the genetic information, and had
a divided opinion about incorporating their genetic information
into their medical records.
to. motenna A. et al. (2010) the Genome Analysis TO011dt: A maoeeduce framework for
analyzing neat-generation DNA segmenting data. Genome Res 20191:1297-1303.
11. Cingobni P snpEff: SNP effect predictor. Available at hnpf/snpeff.sourceforge.netr
ACCeSSed September 19, 2013.
12. Cingolani P, et al. (2012)A program for annotatin and predicting the effects of single
nucleotide poirrnabhiSms. SnpEff: SNPs in the genome of Drosophila melanogaster
strain while; Iso-2;
(Austin) 612)930-92.
13. San Lucas FA, Wang G, Schee< P, Peng Et (2012) Integrated annotation and analysis or
genetkvariamsfrannext-generationsequencingstudesMthvarianttook euoinformarks
ZB(3):421-422.
14. Wang K, Li M, Habana/len H (2010) ANNOVAft functional annotation of genetic
variants from high.throughput sequencing data. Nudek Adds acs 38(161:e164.
15. Kuhn RM, leaussler D, Kent W1 (2013) The UGC gencene browser and associated
tools. &lel Bioinfonn 14(2)140-161.
16. Stamen PO. et .1(2012)Th, thaw Gene Mutation Database (HOMO) and Its exploitation
in the fields of persona/tied gerramics and molecular evolution. Caw Protocol Ilioldorm.
17. stenson PD. at al. (2009) The Human Gene Mutation Database: 2008 update. Genoa*
Med 1(1):I3.
Gonzalez-Garay et al. www.pnas.orgicgi/contentishort/1315934110
2 of 8
EFTA01140249
IL Anonymous NCEtt OMIM Online Mendelian Inheritance inMan.Availableat httpdAvww.
neblnimnih.govrornim Accessed September 19.2013.
19. Anonymous Online Mendelian Inheritance in Man OMIM. Available at httpfromim.
org. Accessed September 19, 2013.
20. Anonymous Genetic Testing Registry (GeneTests). Available at httpavnwr.geneteStS.
org. Accessed September 19, 2013.
21. Adrhubel IA, et al. (2010) A method and server for predicting damaging rMSSenSe
mutations. Nat Methods 7(4)248-249.
22. Kumar P, Henikoff S, Ng PC (2009) Predicting the effects of coding nor-synonymous
variants on protein function using the SIFT agorithm. Nat PrOtOC 417):1073-1081.
23. Sim M., et al. (2012) SIFT web server predicting effects of amino acid substitutions on
proteins. Nucleic Adds Res 40(Web Server issue):W452-W457.
24. Hu J. Ng PC (2012) Predicting the effects of frameShiffing Indels. Genre
13(21iR9.
2S. Ng PC Hentoff S (2001) Predicting deleterious amino acid substitutions. Genome Res
I1(5):863-874.
21, Ng PC. Henikoff S (2003) SIFT: Predicting amino acid changes that affect protein
function. Nucleic Acids Res 31(13):3812-1814.
27. Ng PC. Henikoff 5 (2006) Predicting the effects of amino acid substitutions On protein
function. Annu Rev Genomia Nun Genet 7:6140.
250
200
I ISO
1100
/ 50
0
ifinx2
lRCAt 1
4
p
28. Schwarz 11A, R6delsperger C, Schuelke µ
Seelow D (2010) MutationTaster
evaluates disease-causing potential Of sequence alterations. Nat Methods 7(8):
S7S-S76.
29. Uu X. Mn X, Boenuinkle E (2011) dbNSFP. a lightweight database of human
nonsynonymous SNPs and their functional predictions. Num Mutat 32(8):894-899.
30. Anonymous NHLSi Extent Sequencing Pitied (ESP) extent variant server. Available at
htipllevs.gs.washingtonedurEVS/. Accessed September 19, 2013.
31. Clarke L. 2henggraciley x. et al. (2012) The 1000 Genomes Project Data management
and community access. Nat Methods 9(S)A59-462.
32. Abecasis GR. et al.; 1000 Genomes Project Comonium (2010) A map of human
genome variation from population.scale sequencing. Nature 467(7319):
1061-1073.
33. Robinson lT. et al (2019 Integrative genomlcs viewer. Nat Iliotedthoi 29(1)24-26.
34. Complete genomics (data file format standard pipeline version 2.0). Available at
http'Aswrw.mrtpletegenomlaco0kustomeriupporvdocumentatIo&100357139.htm1.
Accessed September 19, 2013.
35. Green RC, Berg 1S, et al. (2013) ACMG recommendations for reporting of
incidental findings In clinical exome and genome sequencing. Genet Med 15(7):
565-574.
3
•
5
Frequency
0
•
3
)4PG
0FTR
GCw23
.3a3TF2
LOUR
LRP2
WWI pea
irA_...*J
Ei4Ql
TTN
2
14.0005
kOCA1
MICA3
ARCM trczarA
CB
SCA0M P.CAN STF42 }WTI
}WOO I..
CAM
Ts
ACVRLI
pan
*mpg Aims Komi
I
Fig. SI. Grouping genes by occurrence. frequency of genes with nonsynonymous coding mutations in our cohort. This graphic provides a summary of the
number of times alleles were observed for an individual gene. In each of these cases, the allele was either part of HGMD or OMIM, rare, and carried a high
polyphen2 score. An example of a gene with frequent risk alleles include Titin, the largest genes in our genome and recently reported to be causative of
dilated cardiomyopathy. A second example of a smaller gene wi h a large number of variations is MR, where the disease database is deep, and it is known to
be one of the most common autosomal recessive diseases in whites. This graphic supports that we did not select polymorphic genes but unique mutations in
each volunteer.
Non-syn-coding saps
s LI
MI
alumina
11.171 • III
High Quaky Sn ps 06%)
11,054 t 8571100%j
NowfyycMInasaps
8.137 • 147
H•gh Quality Saps detected
also by Alumina 172%1
Average of 24 samples
CGI variants only
Rig. S2. Variants detected using Complete Genomics Inc (CGO and Illumine. (Left) Comparison of nonsynonymous coding SNPs (NSCS) obtained from Com-
plete Genomics (red) and Illumine (green). Twenty-four human samples were sequenced using both technologies, and NSCS were compared in each sample.
The average results were calculated and graphed as a venn diagram. The intersection represents the set of NSCS detected by both technologies. On average,
73% of the NSCS detected by CGI were also detected by Illumine, while 82% of the NSCS detected by Illumine were also detected by CGI. (Right) Using the same
samples we calculated that 96% of all the CGI NSCS are considered "High Quality" according to the CGI proprietary quality matrix. An average of 72% of all the
Nsa detected by CGI was also detected by Illumine (blue). Since two orthogonal sequence technologies detected the same set of NSCS, this group of variants
most likely represents a set of real variants which we refer to as Mighty reliable NSCS." The set of "Highly reliable NSCS" were used to establish quality criteria
in our Illumina's variant detection pipeline.
GOnZeleZ-Gerity et al. www.pnas.orgicgikontentrshorV1315934110
3 of 8
EFTA01140250
Table Si.
Case
Disease associations with alleles
Disease
Risk gene
Allele
HGMD
OMIM gene ID
3937
Hypercholesterolaemia
LOLA
p.P526H
CM 100938
606945
3890
Hypercholesterolaemia
LOLA
O7261
CM920469
606945
3910
Hypercholesterolaemia
LOLA
O7261
CM920469
606945
3900
Hypercholesterolaemia
LOLA
p.V8271
CM920471
606945
3915
Hypercholesterolaemia
LOLA
p.V8271
CM920471
606945
3923
Obesity
MC4R
p.1251L
CM030483
155541
3923
Diabetes mellitus, type II
MAPK8IP1
p.D386E
NA
604641
3973
Obesity
MC4R
p.C326R
CM070992
155541
3937
Diabetes mellitus type 2 (MODY)
FN3K
p.H146R
NA
608425
3937
Diabetes mellitus type 2 (MODY)
PASK
p.P12S6L
NA
607505
3923
Macular degeneration, age related
ABC*:
p.G863A
CM970003
601691
3898
Brittle cornea syndrome type 1
ZNF469
pD2902Y
NA
612078
(BCS1) keratoconus
3889
Male infertility
USP26
p.T123 Q124insT
NA
300309
3942
Melanoma
BAG4
p.W103X
NA
603884
3959
Melanoma
GRIN2A
p.N1076K
NA
138253
3896
Breast or ovarian cancer
BRCA2
p.1505T
CM010167
600185
3959
Breast or ovarian cancer
BRCA2
p.S384F
CM065036
600185
3897
Breast or ovarian cancer
BRCA2
p.T2515I
CM994287
600185
3950
Follicular thyroid cancer (age 41)
TPR
p.R105C
NA
189940
3960
Prostate cancer
LRP2
P.N479H
NA
600073
3960
Prostate cancer
LRP2
P.G4417D
NA
600073
3934
Nonsyndromic deafness
MYH14
p.M1611
NA
608568
3934
Nonsyndromic deafness
SLC17A8
p.R75C
NA
607557
NA, not available.
%IN
Table Si. Familial diseases and assedatIons
Case
Disorder
prer'
Association
Gene
Volunteer relatedness
Volunteer
Affected relative
3949
3947
3930
3930
3928
Praeder Willie
Paraganglioma
Ankylosing spondylitis
Tourettes
Parkinson
MAGEL2
SDHB
HLA-827
TBD
LRRK2
2°
1°
1°
1°13)
1°
IP
IP
—, negative; IP, research in progress.
Gonzalez-Casey et al. www.pnas.olgkgVcontent/short/13I5934I10
4 of 8
EFTA01140251
Table S3. Recessive disorders
Cases
Disease
Risk gene
Allele
HGMD
OMIM
3958
Niemann-Pick type C2 disease
NPC2
p.N111K
CM081368
601015
3896, 3900, 3915, 3895 Antitrypsin al deficiency
SERPINA1
p.R247C, p.E366K (3)
CM910298, CM830003
107400
3894
Glycogen storage disease 0
GYS2
p.Q183X
CM023388
138571
3889
Glycogen storage disease la
G6PC
p.R83C
CM930261
613742
3901
Glycogen storage disease 3
AGL
p.R477H
CM 104343
610860
3945
Glycogen storage disease 4
GBEI
p.Y329S
CM960705
607839
3898
Glycogen storage disease 6
PYGL
p.D634H
CM078418
613741
3941, 3952
Glycogen storage disease 9B
PHKB
p.Q650K
CM031327
172490
3915, 3919, 3943, 3954 Fanconi anemia
FANCA
p.T126R, p.S858R (3)
CM043494, CM992317
607139
3936, 3934
Familial Mediterranean fever
MEFV
p.E148Q, p.P369S, p.R408Q
CM981240, CM990837, CM990838 608107
395, 439, 243, 953
Cystic fibrosis
CFTR
p.D1152H, p.S1235R,
CM950256, CM930133
602421
3933
Sandhoff disease
HEXB
p.A543T
CM970723
606873
3940
Fuchs endothelial dystrophy
ZEB1
p.Q824P
CM 100242
189909
3908
Factor V deficiency
FS
p.P18165
CM095204
612309
3952
Hepatic lipase deficiency
LIPC
p.T405M
CM910258
151670
3962
Krabbe disease
GALC
p.T112A
CM960678
606890
3954
Macular corneal dystrophy, type 2
CHST6
p.Q331H
CM055930
605294
3891, 3947, 3959, 3924, Usher syndrome Id
CDH23
p.A366, p.01806E, p.R1060W CM050545, CM105104, CM021537 605516
3895, 3897
3900, 3910
Phenylketonuria
PAH
p.A3005, p.R53H
CM920555, CM981427
612349
3933, 3946
MCAD (medium-chain acyl-coA
dehydrogenase deficiency)
ACADM
p.K329E (2)
CM900001
607008
3914
Adrenal hyperplasia
HSD3B2
p.R249X
CM950655
613890
3926
17-a-hydroxylase/17,20-Iyase
deficiency
CYP17A1
p.R449C
HM0669
609300
Table 54. X-linked recessive
Case
Disorder
Risk gene
Allele
Sex
HGMD
OMIM
3891
ATRX syndrome
3930
Fabry disease
3901
Mucopolysaccharidosis II
ATRX
GLA
IDS
p.N18605
p.A143T
p.D252N
Female
Female
Female
CM950125
CM972773
CM960865
300032
300644
300823
Table SS. Breast cancer risk
Case
Disease
Risk gene
Allele
Family history
Sex
Age (y)
HGMD
OMIM gene ID
3959
Breast cancer
BRCA2
p.5384F
Affected (44)
Female
44
CM065036
600185
3896
Breast cancer
BRCA2
p.15057
Affected
Female
49
CM010167
600185
3955
Breast cancer
BRCA2
p.E1625fs
Negative
Female
42
CD011121
600185
3962
Breast cancer
PALB2
p.V1103M
First second, third degree (2)
Female
51
CM 118272
610355
(49-60s)
3936
Breast cancer
BACA?
p.Y856H
First degree (sister 40s)
Male
62
CM042673
113705
3936
Breast cancer
BRCA2
p.K2729N
First degree (sister 40s)
Male
62
CM021957
600185
3963
Breast cancer
BRCA2
p.R2034C
First degree (60s)
Male
48
CM994286
600185
3897
Breast cancer
BRCA2
p.T25151
First degree (80)
Female
51
CM994287
600185
3934
Breast cancer
RADS1C
pT287A
First degree (uterine)
Female
50
NA
602774
3939
Breast cancer
RADSO
p.R1069X
First degree breast (60s)hecond
colon (60s)
Male
56
NA
604040
3912
Breast cancer
RADS1C
p.A126T
Negative
Male
77
CM1010201
602774
3923
Breast cancer
RADS1C
pT287A
Negative
Male
60
CM1010198
602774
3956
Breast cancer
RADS1C
pT287A
Negative
Male
59
CM1010198
602774
NA, not available.
Gonzalez-Gas ay et al. www.pnas.orgkgkontent/shortfl3I5934I10
S of 8
EFTA01140252
Table S6.
Case
Colon cancer risk
Disease
Risk gene
Allele
Family history
Sex
Age (y)
HGMD
OMIM gene ID
3896
Colon cancer
MLHI
p.K618A
First degree
Female
49
CM973729, CM950808
120436
3891
Colon cancer
MLH3
p.E1451K
First degree (70s)
Female
62
CM013011
604395
3897
Colon cancer
APC
p.A2690T
First and second
degree cancer
Female
51
CM045404
611731
3904
Colon cancer
MSH2
p.G315V
Second degree
Male
49
CM 995220
609309
3897
Colon cancer
MSH2
p.G12D
Negative
Female
51
CM 950813
609309
3962
Colon cancer
APC
p.52621C
Negative
Female
51
CM921028
611731
3955
Colon cancer
APC
p.R2505C?
Negative
Female
42
NA
611731
3933
Colon cancer
MUTYH
p.63820
Negative
Female
69
CM020287
604933
NA, not available.
Table 57. Other cancer risk
Case
Disease
Risk gene
Allele
Family history
Sex
Age (y)
HGMD
OMIM gene ID
3959
Melanoma
GRINIA
p.N1076K
Affected
Female
44
NA
138253
3942
Melanoma
BAG4
p.W103X
Affected
Male
70
NA
603884
3950
Follicular thyroid cancer
TPR
p.R105C
Affected
Male
48
NA
189940
3960
Prostate cancer
LRP2
p.N479H
Affected
Male
65
NA
600073
3946
Prostate cancer
LRP2
p.M46011
Negative
Female
59
NA
600073
3957
Prostate cancer
LRP2
p.N17975
First degree
Male
44
NA
600073
(father)
3957
Prostate cancer
DLC1
p.089N
First degree
Male
44
NA
604258
(father)
3932
Prostate cancer
CHEKI
p.E64K
Negative
Male
47
CM030414
604373
3935
Prostate cancer
ELACI
p.R781H
Negative
Female
70
CM010221
605367
3902
Prostate cancer
MSR1
p.H441R
Negative
Female
46
CM023581
153622
3900
Prostate cancer
MSR1
p.R293X
Negative
Male
45
CM023579
153622
3954
Prostate cancer
RNASEL
p.E265X
Negative
Male
72
CM020300
180435
3954
Prostate cancer
RNASEL
p.6595
Negative
Male
72
CM031342
180435
3963
Retinoblastoma
RBI
p.R656W
Negative
Male
48
CM030511
614041
3896
Pituitary cancer
ACVRL1
p.A482V
Negative
Female
46
CM994582
601284
3896
Pituitary cancer
ACVRL1
p.A482V
Negative
Female
46
CM994582
601284
3930
Esophageal cancer
WWOX
p.G 1785
Negative
Female
52
NA
605131
3973
Esophageal cancer
WWOX
p.R120W
Negative
Male
71
CM016224
605131
3916
Esophageal cancer
WWOX
p.R120W
Negative
Male
70
CM016224
605131
3941
Gastric cancer
MET
p.A347T
Negative
Male
46
NA
164860
NA, not available.
Gonzalez-Gas ay et al. www.pnas.orgkgifccintent/shOrtfl3I5934I10
6 of 8
EFTA01140253
Table 58. Cardiomyopathy-affected volunteers
Case
Disease
Risk gene
Allele
Clinical
Age (y)
HGMD
OMIM gene ID
3925 Dilated cardiomyopathy
MYH6
p.A1443D
Atrial fibrillation
65
CM107536
160710
3926 Cardiomyopathy
arrhythmogenic right ventricular
DSG2
p.V158G
Arrhythmia
65
CM070921
125671
3935 Dilated cardiomyopathy
MYH6
p.R1398Q
Cardiac dysrhythmia
70
NA
160710
3935 Cardiomyopathy, dilated, 1EE
MYH6
p.R1398Q
Cardiac dysrhythmia
70
NA
160710
3935 Arrhythmogenic right
ventricular cardiomyopathy
TTN
p.P3751R
Cardiac dysrhythmia
70
NA
188840
3955 Dilated cardiomyopathy
ACTN2
p.Q349L
V pacemaker
53
NA
102573
3955 Familial hypertrophic
cardiomyopathy 12
CSRP3
p.R100H
V pacemaker
53
CM091458
600824
3916 Dilated cardiomyopathy
type 1A
LAMA2
p.T821M
Stent placement
71
NA
156225
3887 Cardiomyopathy, hypertrophic
MYBPC3
p.R326Q
Stent placement (3)
73
CM020155
600958
3887 Cardiomyopathy familial
hypertrophic (CMH)
MYLK2
p.V402F
Stent placement (3)
73
NA
606566
3953 Brugada syndrome
(arrhythmia)
KCNE3
p.M65T
Two bypass, scent,
and familial history of CAD
71
NA
604433
3953 Arrhythmogenic right
ventricular cardiomyopathy
TTN
p.P5237T
Two bypass, scent,
and familial history of CAD
71
NA
188840
3937 Hypercholesterolaemia
LDLR
p.P526H
Three generations of early MI,
elevated LDL, cholesterol, triglycerides,
and treated with statins
53
CM 100938
606945
3890 Hypercholesterolaemia
LDLR
p.T7261
1° early MI
57
CM920469
606945
3910 Hypercholesterolaemia
LDLR
p.T7261
V aortic occlusion,
elevated cholesterol
51
CM920469
606945
3900 Hypercholesterolaemia
LDLR
p.V8271
1° early MI
45
CM920471
606945
3915 Hypercholesterolaemia
LDLR
p.V8271
Three generations of elevated cholesterol,
treated with statins
70
CM920471
606945
CAD, coronary artery disease; MI, myocardial infarction; NA, not available.
Table S9. Cardiomyopathy unaffected but family history
Case
Disease
Risk gene
Allele
Clinical
Age (y)
HGMD
OMIM gene ID
3943
Arrhythmogenic right ventricular
cardiomyopathy
TTN
p.G1345D
Familial history of
arrhythmia
44
NA
188840
3896
Dilated cardiomyopathy
SYNE1
p.I.3057V
Familial history
45
NA
608441
3896
Arrhythmogenic right ventricular
dysplasia/cardiomyopathy
JUP
p.V648I
Familial history
45
NA
173325
3944
Hypertrophic cardiomyopathy
OBSCN
p.K1671N
Father
45
NA
608616
3931
Dilated cardiomyopathy
MYH6
p.R1398Q
Familial history
46
NA
160710
3907
Cardiomyopathy, hypertrophic
ACTN2
p.T495M
Father
47
CM101366
102573
3950
Cardiomyopathy
MYOMI
p.G11625
Familial history
48
NA
603508
3919
Romano-Ward syndrome (arrhythmia)
SCNSA
p.51769N
Familial history
51
CM002391
600163
3889
Romano-Ward syndrome (arrhythmia)
SCNSA
p.51769N
Mother
51
CM002391
600163
3917
Cardiomyopathy
MYOMI
p.R1573Q
Familial history +
father
51
NA
603508
3960
Dilated cardiomyopathy
NEBL
p.K60N
Son CAD
66
CM106905
605491
3976
Cardiomyopathy
MYOM1
p.E704K
Older brother
72
NA
603508
3976
Early onset myopathy
MYH2
p.V9701
Older brother
72
CM051560
160740
Gonzalez-Gatay et al. www.pnas.orgkgkontentishort./13I5934I10
7 of
EFTA01140254
Table 510. Neurodegenerative risk
Case
Disease
Risk gene
Allele
Family history
Age (y)
HGMD
OMIM
3908
Alzheimer's disease
APOE
p.C130R
Negative
44
CM900020
107741
3916
Alzheimer's disease
APOE
p.L46P
Parkinson 1° (72)
71
CM990167
107741
3954
Alzheimer's disease
APP
p.R469H
Negative
72
NA
104760
3942
Frontotemporal dementia
MAPT
p.5427F
Negative
71
NA
157140
3954
Frontotemporal dementia
MAPT
p.V224G
Negative
72
NA
157140
3895
Parkinson disease
ElF4G1
p.G686C
Negative
49
CM117028
600495
3916
Parkinson disease
ElF4G1
p.R120SH
Parkinson 1° (78)
64
CM117009
600495
3951
Parkinson disease
ElF4G1
p.51596T
Negative
64
NA
600495
3931
Parkinson disease 11
GIGYF2
p.P1222fs
Negative
44
NA
612003
3946
Parkinson disease 11
GIGYF2
p.H1171R
Negative
59
NA
612003
3957
Parkinson disease 11
GIGYF2
p.M481
Negative
44
NA
612003
3930
Parkinson disease 11
GIGYF2
p.51035C
Negative
S2
NA
612003
3933
Parkinson disease 11
GIGYF2
p.5103SC
Negative
68
NA
612003
3928
Parkinson disease
LRRK2
p.A419V
Tremor 1° Parkinson 2°
68
CM125746
609007
3903
Parkinson disease
LRRK2
p.O972G
Negative
54
NA
609007
3919
Parkinson disease
LRRK2
p.O972G
Negative
51
NA
609007
3889
Parkinson disease
LRRK2
p.620195
Negative
51
CM050659
609007
3951
Parkinson disease
LRRK2
p.L119P
Negative
50
NA
609007
3918
Parkinson disease
LRRK2
p.L286V
Negative
64
NA
609007
3907
Parkinson disease
LRRK2
p.P15425
Alzheimer's 2°
47
NA
609007
3935
Parkinson disease
LRRK2
p.P15425
Negative
70
NA
609007
3893
Parkinson disease
LRRK2
p.R1514Q
Negative
45
CM057190
609007
3943
Parkinson disease
LRRK2
p.R1514Q
Negative
SO
CM057190
609007
3949
Parkinsonism, juvenile,
autosomal recessive
PARK2
p.R275W
2° three siblings
S2
CM991007
602544
3924
Parkinsonism, juvenile,
autosomal recessive
PARK2
p.R334C
Negative
54
CM003865
602544
3927
Parkinson
PM20D1
p.A332V
Negative
73
NA
613164
3886
Parkinson
PM20D1
p.P2S1Q
Negative
62
NA
613164
Table 511. Percentage of survey respondents reporting having made behavioral changes
specifically motivated by their test results
Type of behavior change
Yes
No
Changes to diet
4 (10%)
36 (90%)
Changes to health care (such as undergoing tests or
seeing a specialist)
4 (10%)
36 (90%)
Changes to use of vitamins/herbal supplements
4 (10%)
36 (90%)
Changes to exercise
3 (8%)
37 (92%)
Changes to medications
1 (2%)
39 (98%)
Changes to insurance coverage
1 (2%)
39 (98%)
Number of respondents making at least one of the
above behavior changes
10 (25%)
Gonzalez-Gatay et al. www.pnas.orglegikontent/short/13I5934I10
8 of 8
EFTA01140255
Technical Artifacts (31)
View in Artifacts BrowserEmail addresses, URLs, phone numbers, and other technical indicators extracted from this document.
Domain
httpdvenw.illumina.comDomain
httplawm.novocraft.comDomain
novccraft.comPhone
1297-1303Phone
1302-1309Phone
13159341Phone
1451-1455Phone
1488-1492Phone
15934110Phone
15936110Phone
15939110Phone
213-7221Phone
325-2330Phone
394-1007Phone
4670319Phone
5934110Phone
669-2671Phone
812-1814Phone
812-3814Phone
823-3831Phone
8571100Phone
916-2924Tail #
N1076KTail #
N111KTail #
N17975Tail #
N18605Tail #
N479HURL
http://samtools.sourceforge.nedURL
http://wnw.uniprotorgtuniprotWire Ref
referenceWire Ref
reflectedForum Discussions
This document was digitized, indexed, and cross-referenced with 1,400+ persons in the Epstein files. 100% free, ad-free, and independent.
Annotations powered by Hypothesis. Select any text on this page to annotate or highlight it.