Introduction to your genome CSE 291 Personal Genomics

  • Slides: 46
Download presentation
Introduction to your genome CSE 291: Personal Genomics for Bioinformaticians 01/10/17

Introduction to your genome CSE 291: Personal Genomics for Bioinformaticians 01/10/17

The personal genomics revolution 23 and. Me: >1 million customers ($200) Genographic Project: >800,

The personal genomics revolution 23 and. Me: >1 million customers ($200) Genographic Project: >800, 000 customers ($150) Family Tree DNA: >800, 000 in database ($99) Genome sequencing is quickly becoming a commodity!

The power of commercial genome databases Survey: are you a morning a night Survey:

The power of commercial genome databases Survey: are you a morning a night Survey: whator color are person? the stripes? Can perform a GWAS on hundreds of thousands of people in a matter of days! Hu et al. Nature Communications 2016

I have a long standing interest in genetics… Age: 20 Age: 1 Extra credit:

I have a long standing interest in genetics… Age: 20 Age: 1 Extra credit: which one is me?

Outline • Why analyze your genome? • Course overview • History of analyzing genomes

Outline • Why analyze your genome? • Course overview • History of analyzing genomes • Basic biology intro • Basic human genetics intro • Discuss problem set 1

Why analyze your genome?

Why analyze your genome?

Mutations have implications in human health Example: Cystic Fibrosis - Caused by mutations in

Mutations have implications in human health Example: Cystic Fibrosis - Caused by mutations in the gene CFTR, most common mutation is Δ 508. - Results in salty skin, poor growth, accumulation of thick, sticky mucus, frequent chest infections. - Life expectancy: 37 years - ~1 in 25 Europeans is a carrier 01 01 Pre-natal carrier testing of parents can now identify couples at risk 1 1: 25% 0 0: 25% 11 and inform reproductive 0 1: 50% options

Our genomes contain a record of human history Recent history Familial relationships Parents, siblings,

Our genomes contain a record of human history Recent history Familial relationships Parents, siblings, cousins, etc. Ancient history Populations Human migration, ancient humans https: //aliciarmartin. com/research/migration_map_revised-2/ Novembre, et al. 2008

Your genome is uniquely identifying

Your genome is uniquely identifying

Your genome can help science! Interpreting one genome requires tens of thousands of genom

Your genome can help science! Interpreting one genome requires tens of thousands of genom - Daniel Mac. Arthur vs. e. g. latest schizophrenia genome wide association study used >100, 000 control genom

Course overview

Course overview

Course objectives • Gain basic bioinformatics skills needed to analyze a personal genome using

Course objectives • Gain basic bioinformatics skills needed to analyze a personal genome using the UNIX command line • Gain the ability to critically read and interpret basic science and translational literature relevant to personal genomics • Demonstrate knowledge and understanding of the social impacts of the personal genomics revolution • Gain skills and experience necessary to carry out original research related to personal genomics

Grading • Participation 10% • Attendance 10% • Problem set 1 5% • Problem

Grading • Participation 10% • Attendance 10% • Problem set 1 5% • Problem set 2 10% • Problem set 3 10% • Problem set 4 10% • Problem set 5 10% • Project proposal 5% • Final Project 30%

Analyzing your own genome • You are welcome and encouraged to explore your own

Analyzing your own genome • You are welcome and encouraged to explore your own genome (e. g. from 23 and. Me) through the problem sets. • If you want to do that, order ASAP, it takes several weeks to get the data back. • Your grade does not depend in any way on whether you analyze your own genome. • You do not need to tell me if you analyze your own genome. • We cannot offer to pay for the test, or provide any counseling

A whirlwind history of human genetics

A whirlwind history of human genetics

Mendel establishes heredity as a principle (~1865) Green peas Yellow peas GG YY F

Mendel establishes heredity as a principle (~1865) Green peas Yellow peas GG YY F 1 Generation 100% Yellow YG YG F 2 Generation 75% Yellow 25% Green YY YG GY Conclusions: 1. Inheritance is determined by “units” (now called genes) 2. An individual inherits one such unit from each parent for each trait 3. A trait my “skip” a generation GG

mid-1900 s: DNA is the genetic material • Griffith experiment (1928): showed bacteria can

mid-1900 s: DNA is the genetic material • Griffith experiment (1928): showed bacteria can transfer genetic information • Avery-Mac. Leod-Mc. Carty experiment (1944): showed that DNA was key component of Griffith’s experiment • Hershey-Chase experiment (1952): used radioactive labeling to show DNA, not protein, transfers genetic information • DNA structure identified (1953) by Watson, Crick (using data from Rosalind Franklin)

First disease gene mapped (1983) George Huntington’s paper (1872) Huntington’s Disease • Progressive neurodegenerative

First disease gene mapped (1983) George Huntington’s paper (1872) Huntington’s Disease • Progressive neurodegenerative disease • Loss of motor control, jerky movements • Age of onset: typically 30 -45 years old • Caused by expansion of a CAG repeat, encoding polyglutamine, in the gene HTT Gusella et al. 1983

The human genome is sequenced (2001) • $3 Billion public project beginning in 1990

The human genome is sequenced (2001) • $3 Billion public project beginning in 1990 • In 1998, Craig Venter started competing private project at Celera • “Draft” published in 2000. We still do not have a complete genome sequence! • >70% from a single male donor from Buffalo, NY (RP 11). At least 4 individuals included.

Toward the $1000 Genome

Toward the $1000 Genome

The personal genomics revolution Hair color Eye color >1 million customers $200 to genotype

The personal genomics revolution Hair color Eye color >1 million customers $200 to genotype 1. 5 million genomic positions Ancestry

Biology Intro

Biology Intro

Bird’s eye view of the human genome Nucleus Cell Autosomes Sex chromosomes http: //missinglink.

Bird’s eye view of the human genome Nucleus Cell Autosomes Sex chromosomes http: //missinglink. ucsf. edu/lm/genes_and_genomes/content. html

DNA (deoxyribonucleic acid) structure Bases: Base pairing Watson-Crick Cytosine C Guanine G Adenine A

DNA (deoxyribonucleic acid) structure Bases: Base pairing Watson-Crick Cytosine C Guanine G Adenine A Thymine T 3’ Other components: Phosphate Deoxyribose (sugar) 5’ 5’ C G A T G C T A 3’ Forward strand: 5’-TGAC-3’ Reverse strand: 5’-GTCA-3’ (reverse complement)

The central dogma DNA GENE DNA Transcription RNA Protein m. RNA Translation Protein

The central dogma DNA GENE DNA Transcription RNA Protein m. RNA Translation Protein

The genetic code http: //www. chemguide. co. uk/organicprops/aminoacids/dna 4. html

The genetic code http: //www. chemguide. co. uk/organicprops/aminoacids/dna 4. html

The structure of a gene TF Promoter Exon 1 Exon 2 Exon 3 Intron

The structure of a gene TF Promoter Exon 1 Exon 2 Exon 3 Intron 1 Intron 2 DNA Transcription ACACUAUCGAUGCAGAUAAAGUUGAGUAGCUGUCUCGGUCGAGCGUAUAAAUCACUAC Splicing RNA 3’ UTR 5’ UTR ACACUAUCGAUGCAGAUAAAUAGCUGUCUCGCGUAUAAATCACU m. RNA Translation M Q I N S Start codon (AUG=Methionine) C L A Y V * Protein Stop codon (UGA, UAG)

Organization of the human genome ~30, 000 protein coding genes in the human genome

Organization of the human genome ~30, 000 protein coding genes in the human genome http: //book. bionumbers. org/how-many-genes-are-in-a-genome/

Cell division – mitosis (somatic) DNA replication Mitosis Two diploid cells

Cell division – mitosis (somatic) DNA replication Mitosis Two diploid cells

Cell division – meiosis (germline) DNA replication Homologous recombination Meiosis II Four haploid cells

Cell division – meiosis (germline) DNA replication Homologous recombination Meiosis II Four haploid cells

Recombination https: //www. reddit. com/r/askscience/comments/3 hq 4 zl/does_crossover_occur_in_all_4_nonsister/

Recombination https: //www. reddit. com/r/askscience/comments/3 hq 4 zl/does_crossover_occur_in_all_4_nonsister/

Human genetics intro

Human genetics intro

Mutations – the bread and butter of genetics! SNP Short indel (1 -20 bp)

Mutations – the bread and butter of genetics! SNP Short indel (1 -20 bp) ACGACTCGAGCG ACGACACGAGCG ACGAC-CGAGCG μSNP: 1. 20 × 10 -8 /loc/gen μINDEL: 0. 68 × 10 -9 /loc/gen Alu retrotransposition Short tandem repeat CAGCAG---CAGCAGCA Struct. Var /CNV (>20 bp) ~75+ STR 0. 05 0. 2 33 Alu CAGCAGCAGCA Alu SNP 75 75 50 50 25 25 00 50 50 # de novo/gen 100 μSTR: 10 -2 -10 -5 /loc/gen SV SV Indel

How do mutations affect proteins? But also… • Regulatory regions • Large structural variations

How do mutations affect proteins? But also… • Regulatory regions • Large structural variations • Alternative splicing • Many others… http: //www. nbs. csudh. edu/chemistry/faculty/nsturm/CHEMXL 153/DNAMutation. Repair. htm

Intro to Mendelian genetics Back to Mendel’s peas… x YG YG F 2 Generation

Intro to Mendelian genetics Back to Mendel’s peas… x YG YG F 2 Generation 75% Yellow 25% Green GY YG YY Y Parent 2 GG G Y YY YG G GY GG Parent 1

Modes of inheritance - dominant aa Aa Example – Marfan Syndrome • Tall and

Modes of inheritance - dominant aa Aa Example – Marfan Syndrome • Tall and slender build • Long arms, legs, and fingers • Heart murmurs, other cardiovascular defects • Nearsightedness Aa Aa aa aa Caused by loss of function mutations in FBN 1 >=1 copies of dominant allele: affected 0 copies of dominant allele: unaffected Unless de novo, at least one parent is affected http: //www. mayoclinic. org/diseases-conditions/marfan-syndrome/symptoms-causes/dxc-20195415

Modes of inheritance - recessive Aa Aa AA Aa a. A aa Example –

Modes of inheritance - recessive Aa Aa AA Aa a. A aa Example – Cystic Fibrosis • Caused by mutations in the gene CFTR, most common mutation is Δ 508 (in frame deletion). • Results in salty skin, poor growth, accumulation of thick, sticky mucus, frequent chest infections. • Life expectancy: 37 years • ~1 in 25 Europeans is a carrier Caused by loss of function mutations in 2 copies of recessive allele: affected CFTR <=1 copies of recessive allele: unaffected Often, both parents unaffected https: //hutchbio. wordpress. com/2012/11/07/cystic-fibrosis/

Modes of inheritance – X linked recessive XX’ XY X’Y XY XX’ Example –

Modes of inheritance – X linked recessive XX’ XY X’Y XY XX’ Example – Hemophilia A • Blood doesn’t clot properly • Heavy bleeding even from small cuts • Bruise easily • Some female carriers show symptoms XX Caused by loss of function mutations in clotting Factor VIII Need at least one unaffected copy of X to be unaffected X’Y, X’X’ affected (X’X’ lethal for some disorders) Typically affects only males Heterozygous females are called “carriers” http: //reference. medscape. com/features/slideshow/hemophilia-a

Example recessive trait – red hair https: //blog. 23 andme. com/health-traits/no-im-not-irish/

Example recessive trait – red hair https: //blog. 23 andme. com/health-traits/no-im-not-irish/

Example recessive trait – blue eyes Iris. Plex: predicts eye color from 6 SNPs

Example recessive trait – blue eyes Iris. Plex: predicts eye color from 6 SNPs All blue eyes have a single common ancestor with a regulatory change in HERC 2 Walsh, et al. 2010 Sturm, et al. 2008

Beyond Mendelian – complex traits Example: height Fisher hypothesized that Mendelian traits could explain

Beyond Mendelian – complex traits Example: height Fisher hypothesized that Mendelian traits could explain continuous traits if many genes each contribute additively to a phenotype. Sir Ronald Fisher

Example complex trait: schizophrenia Heritability: 80% i. e. 80% of twin pairs concordant for

Example complex trait: schizophrenia Heritability: 80% i. e. 80% of twin pairs concordant for SCZ status Schizophrenia Working Group of the Psychiatric Genomics Consortium

Problem set 1

Problem set 1

SNP array data • This is the type of data you’ll get from 23

SNP array data • This is the type of data you’ll get from 23 and. Me and other companies • As opposed to whole genome sequencing, which sequences the entire genome, genotype arrays genotyped a pre-determined set of known polymorphic positions • E. g. 23 and. Me genotypes ~1. 5 million variants BB • Probes for allele “A” and “B” • By comparing intensities, can infer genotype (e. g. AA, AB, BB) AB AA

Getting started https: //gymreklab. github. io/teaching/personal_genomics/ps 1_resources. html Before you go: • Sign up

Getting started https: //gymreklab. github. io/teaching/personal_genomics/ps 1_resources. html Before you go: • Sign up for an XSEDE account • Get started on PS 1