ESP ELSI Holly K Tabor Ph D for

ESP ELSI Holly K. Tabor, Ph. D. for the ELSI Working Group Assistant Professor Treuman Katz Center for Pediatric Bioethics, Seattle Children’s Hospital Division of Bioethics, Department of Pediatrics, University of Washington October 14, 2010

Outline • Aggregate data sharing in db. SNP • Preliminary analysis of results in genes of clinical relevance

Aggregate Data Sharing in db. SNP

Options for db. SNP Option 1: No data in db. SNP Option 2: Variant data but no frequency data, listed as derived from ESP (or NHLBI) Option 3: Variant data with frequency data, not stratified by race/ethnicity or cohort Option 4: Variant data with frequency data, stratified by race/ethnicity and/or cohort

Data Sharing • db. GAP – Anteroom/within ESP sharing – db. GAP sharing to NIH approved investigators with institutional approval – Phenotypic and genomic data • db. SNP – Publically available – Aggregate data: what cohort (ESP), frequency, perhaps broken down by race/ethnicity – For ESP: not disease group • Other Data Sharing (between groups)

Risks of Data Sharing • Risk of identifiability • Risk of harms from identifiability • How these risks manifest in db. SNP, relative to db. GAP and other forms of data sharing? • What obligations, if any, do researchers have regarding db. SNP aggregate data: – to IRBs – to participants/advisory boards

Possible Harm Scenarios Scenario 1: • In the future, law enforcement conducts exome sequencing/genotyping arrays forensic purposes • Using data downloaded from db. SNP, they determine that a person of interest in a crime is part of ESP, or a relative of that person • They subpoena the study to obtain the person’s identity • Unclear if a certificate of confidentiality would protect against this scenario Scenario 2: • People with rare genotypes are identified by their insurance companies as being in ESP. The data are mined for other genetic risk factors that can be used to deny coverage (not explicitly).

ESP-ELSI Discussion • The group discussed at length the difference between the kind of data that is available in db. GAP as well as that in db. SNP. • The group discussed how most data breaches are internal, not external, and there may be greater risks with db. GAP data. • A minority of the ELSI group is concerned generally about public data sharing, and extensions of data sharing, beyond what is required by NIH policy (e. g. db. GAP), because of general risks of availability of data.

ESP-ELSI Discussion • The majority of the ELSI group thinks that the risks are extremely small, and that the potential scientific benefits far outweigh the possible risks, both of identifiability and harms from identifiability. • One option to minimize some of the risk of identifiabiiity would be to not include variants that exist below a certain frequency in the cohort. But it is not clear what that cutoff should be. • Another option would be to not include frequency data, but this needs to be weighed against the scientific value/benefit of that data.

ESP-ELSI Discussion • The ELSI group thinks that IF cohorts are planning to notify participants or advisory boards about db. SNP data sharing, they should follow the same protocol for db. GAP data sharing.

Preliminary analysis of results in genes of clinical relevance

To be clear… • This analysis and presentation is NOT: – Meant to be any suggestion about what any cohort should do about returning results. Those decisions need to be made at the study-cohort level. – An analysis of which results should or should not be returned in general • It IS: – A very preliminary analysis of the kinds of possibly clinically important results that exome sequencing studies like ESP may find, and their frequency

Why? • Important for future exome studies to anticipate the scope of what might be identified • Since the data will be in db. GAP, it will be possible for anyone who gets access to conduct these analyses • Note: all data are stripped of links to their cohort

Protocol • Select genes to analyze • Identify variants (synonymous, missense, nonsense, splice) and frequency • Determine if variant found in db. SNP or 1000 Genomes • Compare to known databases on a variant by variant basis (Gene. Tests, HGMD, gene/disease-based databases) • Determine which variants are pathogenic, novel, unknown significance • Calculate summary statistics

Genes • • • Newborn Screening carrier status-40 genes Other recessive carrier status-2 genes Pediatric onset, clinically actionable-14 genes Pediatric onset, uncertain clinical action-2 genes Adult onset, clinically actionable-7 genes Adult onset, uncertain clinical action-3 genes Common genetic diseases-5 genes Common multifactorial conditions-3 genes Pharmacogenetics-4 genes Total: 80 genes

HFE-HHC • Characterized by inappropriately high absorption of iron by the gastrointestinal mucosa, resulting in excessive storage of iron • Early symptoms: abdominal pain, weakness, lethargy and weight loss • Without therapy, symptoms may develop in males ages 40 -60, females after menopause • Hepatic fibrosis or cirrhosis may occur in untreated individuals over 40 • Diagnosis: screening tests of transferrin-iron saturation and serum ferritin concentration, followed by histological assessment of C 282 Y and H 63 D mutations and/or histologic assessment of hepatic iron stores on liver biopsy

HFE-HHC Results: • H 63 D: 15 homozygotes, 177 heterozygotes, 1 missing • C 282 Y: 3 homozygotes, 97 heterozygotes, 0 missing • Compound Heterozygotes: 16 • S 65 C: 0 homozygotes, 18 heterozygotes, 0 missing – (Mild HFE, not usually tested)

Factor V Leiden • • Most common inherited form of thrombophili. Heterozygotess: 3 -8% of Cacausian US/Europeans Homozygotes: 1 in 5, 000 people Risk of a DVT: one copy is 4 -8 X increased risk, two copies may be up to 80 X increased risk. Results: • R 506 Q (rs 6025): 28 heterozygotes, no homozygotes (2. 7%)

Lynch Syndrome/HNPCC MSH 2, MLH 2, MSH 6, PMS 1 HGMD Database: • MSH 6: 6 variants, 4 disease causing, 2 probable disease causing (43 total) • PMS 1: 1 disease causing variant (30 total) • Clearly these need to be validated.

Nonsense Variants • 18 separate nonsense variants found (0. 74% of all variants) • 66% not in db. SNP, 33% in CFTR (not surprising) • All of those not in CFTR are singletons EXCEPT for BRCA 2 Genes • NBS: TPO (Congenital hypthyroidism); TSHR (Congenital hypothyroidism); Duox 2 (Congenital hypothyroidism); HEXA (Tay Sachs); HLCS (Multiple carboxylase deficiency); BTD (Biotinidase deficiency) • Possible clinical action: SCN 1 a (seizure disorder) • Pharmacogenomic: TPMT(6 -mercaptopurine) • Cohort trait: CFTR (Cystic Firbrosis) • Cancer: ATM (Ataxia telangiectasia, cancer); BRCA 2 (Breast and ovarian cancer)

Preliminary Conclusions • Preliminary analysis: much work still to be done • For all analyses, it is challenging to know which databases to use to evaluate the pathologic significance of variants • Many exome sequencing participants will have results in known disease-causing genes with possible clinical actions • Many exome sequencing participants will have results in genes without clinical significance but with personal meaning (e. g. , NBS) • Many variants will be identified that are novel or have unknown significance

Next Steps • Complete analysis of all variants in 100 genes in this sample • Examination of compound heterozygotes where appropriate • Calculation of number of variants found that meet different threshold criteria of results

Acknowledgements University of Washington • Michael Bamshad • Abby Bigham • Debbie Nickerson • Jay Shendure • Maggie Mc. Millin • Keolu Fox ESP-ELSI Working Group • Mike Bamshad • Malia Fullerton • Leslie Raffel • Tim Graubert • Donna Chen • Jerry Rotter • Dina Paltoo Seattle Children’s Hospital/Treuman Katz Center for Pediatric Bioethics • Julia Crouch • Jacquie Stock • Ben Wilfond Chris Carlson Wylie Burke