Introduction to your genome CSE 291 Personal Genomics














































- Slides: 46
Introduction to your genome CSE 291: Personal Genomics for Bioinformaticians 01/10/17
The personal genomics revolution 23 and. Me: >1 million customers ($200) Genographic Project: >800, 000 customers ($150) Family Tree DNA: >800, 000 in database ($99) Genome sequencing is quickly becoming a commodity!
The power of commercial genome databases Survey: are you a morning a night Survey: whator color are person? the stripes? Can perform a GWAS on hundreds of thousands of people in a matter of days! Hu et al. Nature Communications 2016
I have a long standing interest in genetics… Age: 20 Age: 1 Extra credit: which one is me?
Outline • Why analyze your genome? • Course overview • History of analyzing genomes • Basic biology intro • Basic human genetics intro • Discuss problem set 1
Why analyze your genome?
Mutations have implications in human health Example: Cystic Fibrosis - Caused by mutations in the gene CFTR, most common mutation is Δ 508. - Results in salty skin, poor growth, accumulation of thick, sticky mucus, frequent chest infections. - Life expectancy: 37 years - ~1 in 25 Europeans is a carrier 01 01 Pre-natal carrier testing of parents can now identify couples at risk 1 1: 25% 0 0: 25% 11 and inform reproductive 0 1: 50% options
Our genomes contain a record of human history Recent history Familial relationships Parents, siblings, cousins, etc. Ancient history Populations Human migration, ancient humans https: //aliciarmartin. com/research/migration_map_revised-2/ Novembre, et al. 2008
Your genome is uniquely identifying
Your genome can help science! Interpreting one genome requires tens of thousands of genom - Daniel Mac. Arthur vs. e. g. latest schizophrenia genome wide association study used >100, 000 control genom
Course overview
Course objectives • Gain basic bioinformatics skills needed to analyze a personal genome using the UNIX command line • Gain the ability to critically read and interpret basic science and translational literature relevant to personal genomics • Demonstrate knowledge and understanding of the social impacts of the personal genomics revolution • Gain skills and experience necessary to carry out original research related to personal genomics
Grading • Participation 10% • Attendance 10% • Problem set 1 5% • Problem set 2 10% • Problem set 3 10% • Problem set 4 10% • Problem set 5 10% • Project proposal 5% • Final Project 30%
Analyzing your own genome • You are welcome and encouraged to explore your own genome (e. g. from 23 and. Me) through the problem sets. • If you want to do that, order ASAP, it takes several weeks to get the data back. • Your grade does not depend in any way on whether you analyze your own genome. • You do not need to tell me if you analyze your own genome. • We cannot offer to pay for the test, or provide any counseling
A whirlwind history of human genetics
Mendel establishes heredity as a principle (~1865) Green peas Yellow peas GG YY F 1 Generation 100% Yellow YG YG F 2 Generation 75% Yellow 25% Green YY YG GY Conclusions: 1. Inheritance is determined by “units” (now called genes) 2. An individual inherits one such unit from each parent for each trait 3. A trait my “skip” a generation GG
mid-1900 s: DNA is the genetic material • Griffith experiment (1928): showed bacteria can transfer genetic information • Avery-Mac. Leod-Mc. Carty experiment (1944): showed that DNA was key component of Griffith’s experiment • Hershey-Chase experiment (1952): used radioactive labeling to show DNA, not protein, transfers genetic information • DNA structure identified (1953) by Watson, Crick (using data from Rosalind Franklin)
First disease gene mapped (1983) George Huntington’s paper (1872) Huntington’s Disease • Progressive neurodegenerative disease • Loss of motor control, jerky movements • Age of onset: typically 30 -45 years old • Caused by expansion of a CAG repeat, encoding polyglutamine, in the gene HTT Gusella et al. 1983
The human genome is sequenced (2001) • $3 Billion public project beginning in 1990 • In 1998, Craig Venter started competing private project at Celera • “Draft” published in 2000. We still do not have a complete genome sequence! • >70% from a single male donor from Buffalo, NY (RP 11). At least 4 individuals included.
Toward the $1000 Genome
The personal genomics revolution Hair color Eye color >1 million customers $200 to genotype 1. 5 million genomic positions Ancestry
Biology Intro
Bird’s eye view of the human genome Nucleus Cell Autosomes Sex chromosomes http: //missinglink. ucsf. edu/lm/genes_and_genomes/content. html
DNA (deoxyribonucleic acid) structure Bases: Base pairing Watson-Crick Cytosine C Guanine G Adenine A Thymine T 3’ Other components: Phosphate Deoxyribose (sugar) 5’ 5’ C G A T G C T A 3’ Forward strand: 5’-TGAC-3’ Reverse strand: 5’-GTCA-3’ (reverse complement)
The central dogma DNA GENE DNA Transcription RNA Protein m. RNA Translation Protein
The genetic code http: //www. chemguide. co. uk/organicprops/aminoacids/dna 4. html
The structure of a gene TF Promoter Exon 1 Exon 2 Exon 3 Intron 1 Intron 2 DNA Transcription ACACUAUCGAUGCAGAUAAAGUUGAGUAGCUGUCUCGGUCGAGCGUAUAAAUCACUAC Splicing RNA 3’ UTR 5’ UTR ACACUAUCGAUGCAGAUAAAUAGCUGUCUCGCGUAUAAATCACU m. RNA Translation M Q I N S Start codon (AUG=Methionine) C L A Y V * Protein Stop codon (UGA, UAG)
Organization of the human genome ~30, 000 protein coding genes in the human genome http: //book. bionumbers. org/how-many-genes-are-in-a-genome/
Cell division – mitosis (somatic) DNA replication Mitosis Two diploid cells
Cell division – meiosis (germline) DNA replication Homologous recombination Meiosis II Four haploid cells
Recombination https: //www. reddit. com/r/askscience/comments/3 hq 4 zl/does_crossover_occur_in_all_4_nonsister/
Human genetics intro
Mutations – the bread and butter of genetics! SNP Short indel (1 -20 bp) ACGACTCGAGCG ACGACACGAGCG ACGAC-CGAGCG μSNP: 1. 20 × 10 -8 /loc/gen μINDEL: 0. 68 × 10 -9 /loc/gen Alu retrotransposition Short tandem repeat CAGCAG---CAGCAGCA Struct. Var /CNV (>20 bp) ~75+ STR 0. 05 0. 2 33 Alu CAGCAGCAGCA Alu SNP 75 75 50 50 25 25 00 50 50 # de novo/gen 100 μSTR: 10 -2 -10 -5 /loc/gen SV SV Indel
How do mutations affect proteins? But also… • Regulatory regions • Large structural variations • Alternative splicing • Many others… http: //www. nbs. csudh. edu/chemistry/faculty/nsturm/CHEMXL 153/DNAMutation. Repair. htm
Intro to Mendelian genetics Back to Mendel’s peas… x YG YG F 2 Generation 75% Yellow 25% Green GY YG YY Y Parent 2 GG G Y YY YG G GY GG Parent 1
Modes of inheritance - dominant aa Aa Example – Marfan Syndrome • Tall and slender build • Long arms, legs, and fingers • Heart murmurs, other cardiovascular defects • Nearsightedness Aa Aa aa aa Caused by loss of function mutations in FBN 1 >=1 copies of dominant allele: affected 0 copies of dominant allele: unaffected Unless de novo, at least one parent is affected http: //www. mayoclinic. org/diseases-conditions/marfan-syndrome/symptoms-causes/dxc-20195415
Modes of inheritance - recessive Aa Aa AA Aa a. A aa Example – Cystic Fibrosis • Caused by mutations in the gene CFTR, most common mutation is Δ 508 (in frame deletion). • Results in salty skin, poor growth, accumulation of thick, sticky mucus, frequent chest infections. • Life expectancy: 37 years • ~1 in 25 Europeans is a carrier Caused by loss of function mutations in 2 copies of recessive allele: affected CFTR <=1 copies of recessive allele: unaffected Often, both parents unaffected https: //hutchbio. wordpress. com/2012/11/07/cystic-fibrosis/
Modes of inheritance – X linked recessive XX’ XY X’Y XY XX’ Example – Hemophilia A • Blood doesn’t clot properly • Heavy bleeding even from small cuts • Bruise easily • Some female carriers show symptoms XX Caused by loss of function mutations in clotting Factor VIII Need at least one unaffected copy of X to be unaffected X’Y, X’X’ affected (X’X’ lethal for some disorders) Typically affects only males Heterozygous females are called “carriers” http: //reference. medscape. com/features/slideshow/hemophilia-a
Example recessive trait – red hair https: //blog. 23 andme. com/health-traits/no-im-not-irish/
Example recessive trait – blue eyes Iris. Plex: predicts eye color from 6 SNPs All blue eyes have a single common ancestor with a regulatory change in HERC 2 Walsh, et al. 2010 Sturm, et al. 2008
Beyond Mendelian – complex traits Example: height Fisher hypothesized that Mendelian traits could explain continuous traits if many genes each contribute additively to a phenotype. Sir Ronald Fisher
Example complex trait: schizophrenia Heritability: 80% i. e. 80% of twin pairs concordant for SCZ status Schizophrenia Working Group of the Psychiatric Genomics Consortium
Problem set 1
SNP array data • This is the type of data you’ll get from 23 and. Me and other companies • As opposed to whole genome sequencing, which sequences the entire genome, genotype arrays genotyped a pre-determined set of known polymorphic positions • E. g. 23 and. Me genotypes ~1. 5 million variants BB • Probes for allele “A” and “B” • By comparing intensities, can infer genotype (e. g. AA, AB, BB) AB AA
Getting started https: //gymreklab. github. io/teaching/personal_genomics/ps 1_resources. html Before you go: • Sign up for an XSEDE account • Get started on PS 1