Trait Mapping Recombination Mapping SNP mapping BIO 520

  • Slides: 32
Download presentation
Trait Mapping • Recombination Mapping • SNP mapping BIO 520 Bioinformatics Jim Lund

Trait Mapping • Recombination Mapping • SNP mapping BIO 520 Bioinformatics Jim Lund

Why do we care about variations? underlie phenotypic differences cause inherited diseases allow tracking

Why do we care about variations? underlie phenotypic differences cause inherited diseases allow tracking human history (ancient and modern)

Traits • Mendelian – single locus, few alleles – high penetrance, high expressivity –

Traits • Mendelian – single locus, few alleles – high penetrance, high expressivity – eg color, enzyme, molecular, genetic diseases (CF, hemophilia…) • Quantitative – – multiple allele, multilocus variable penetrance, expressivity epistasis, environmental effects eg. blood pressure, weight, IQ. . .

Traits How do we find their basis? • Association of variance in trait with

Traits How do we find their basis? • Association of variance in trait with variance in gene • Genetic linkage

Basic Concepts Parent 2 Parent 1 A B a b A B a b

Basic Concepts Parent 2 Parent 1 A B a b A B a b X A B a b High LD -> No Recombination (r 2 = 1) SNP 1 “tags” SNP 2 B a OR a b A B a B A b etc… Low LD -> Recombination Many possibilities

Mapping Issues • Need many arbitrary, polymorphic markers for dense map – Molecular markers:

Mapping Issues • Need many arbitrary, polymorphic markers for dense map – Molecular markers: RFLP, STS, SNP • Need many progeny – 100 progeny for 1 c. M map – 1000/0. 1 c. M map, 100 kb in mouse • Map distance varies (the ratio of kb/c. M not constant) – centromere suppression – inversion suppression

Genetic crosses • Model organisms, e. g. Fungi, no problem • Humans – rare

Genetic crosses • Model organisms, e. g. Fungi, no problem • Humans – rare woman who will bear >5, >10 children – controlled breeding problematic

Alternate Mapping • Pedigree analyses – likelihood estimation – The original method, now less

Alternate Mapping • Pedigree analyses – likelihood estimation – The original method, now less common • Population-based mapping – association studies – linkage disequilibrium

Pedigree Analysis • Likelihood Method (LOD scores) • LOD 3 -4, 1/1000 – 1/10000

Pedigree Analysis • Likelihood Method (LOD scores) • LOD 3 -4, 1/1000 – 1/10000 odds of linkage – genome-wide p-value of p <. 05 • Hard to extend to <1 c. M

Cloning Human Genes • • Positional/Candidate Only Functional

Cloning Human Genes • • Positional/Candidate Only Functional

Complex diseases Association mapping • Disease gene: D, d • Marker: M, m M

Complex diseases Association mapping • Disease gene: D, d • Marker: M, m M associated with D if the probability of an individual having the disease given that they have allele M is much greater than the chance of having the disease if the individual has allele m. Written as: P(D|M) > P(D|m) Linkage between the gene and marker increases the likelihood of D M 1 M 2 M 3 M 4 association. M 5 M 6 Association can be caused by – – Causation Population subdivision Statistical artifact Linkage disequilibrium

Association Mapping • Pedigree sampled • Many Meiosis (>104) • Limited by number of

Association Mapping • Pedigree sampled • Many Meiosis (>104) • Limited by number of markers M r D 2 N generations • Resolution: 10 -5 Morgans (Kbases)

Gene Mapping & the single mutation case D M At time t D Now

Gene Mapping & the single mutation case D M At time t D Now M

Complicating factors Major Disease Causing Mutation. Minor Disease Causing Mutation + + has the

Complicating factors Major Disease Causing Mutation. Minor Disease Causing Mutation + + has the disease. + + Non-genetic cause + + + Incomplete penetrance Oversampled

Alzheimers & Apolipoproteins E

Alzheimers & Apolipoproteins E

Definition of QTL? A quantitative trait locus (QTL) is the location of individual or

Definition of QTL? A quantitative trait locus (QTL) is the location of individual or multiple loci that affects a trait that is measured on a quantitative (linear) scale. Examples of quantitative traits are blood pressure and grain yield (measured on a balance). These traits are typically affected by more than one gene, and also by the environment. Thus, mapping QTL is not as simple as mapping a single gene that affects a qualitative trait (such as an inborn error of metabolism). http: //gnome. agrenv. mcgill. ca/tinker/pgiv/whatis. htm

QTLs-interesting traits • Heritability often ~0. 5 • Traits like: – Heart disease –

QTLs-interesting traits • Heritability often ~0. 5 • Traits like: – Heart disease – Depression – Type II diabetes – High blood pressure – Arthritis – Most diseases!

QTLs-simple problems • 30, 000 markers – P-value=0. 01 – 299 false hits, 1

QTLs-simple problems • 30, 000 markers – P-value=0. 01 – 299 false hits, 1 real one – Correct for multiple testing • 2 QTLS near one another – “ghost” QTL between them

Factors that lead to success in mapping QTLs • Simple, easily quantified trait •

Factors that lead to success in mapping QTLs • Simple, easily quantified trait • Genes of major effect – distinct chromosomal loci • Well-defined map • Large numbers of progeny – inbred – outbred

Significance Thresholds by Permutation Churchill and Doerge, 1994 1. Permute the data (create the

Significance Thresholds by Permutation Churchill and Doerge, 1994 1. Permute the data (create the null hypothesis) H 0: there is no QTL in the tested interval H 1: there is QTL in the tested interval 2. Perform interval mapping 3. 4. 3. Repeat (1) and (2) many times 4. Choose Threshold

Human SNPs • About 10 million SNPs exist in human populations where the rarer

Human SNPs • About 10 million SNPs exist in human populations where the rarer SNP allele has a frequency of at least 1%. • A set of associated SNP alleles in a region of a chromosome is called a "haplotype". • SNPs are arranged in groups – SNPs within groups show little recombination – Nonrandom association of SNPs results in only a few common haplotypes – Patterns capture most of the variation in a region • The Hap. Map will describe the common patterns of genetic variation in humans. • The Hap. Map Project will identify the associations between SNPs and identify the SNPs that tag them (tag. SNPs).

SNPs identification methods • Pairwise sequence comparison • Deep resequencing • High throughput mismatch

SNPs identification methods • Pairwise sequence comparison • Deep resequencing • High throughput mismatch detection methods – Denaturing high-performance liquid chromatography (DHPLC) – Single-strand Conformational Polymorphism (SSCP)

Hap. Map • Blocks of adjacent SNPs that show little recombination are called haplotype

Hap. Map • Blocks of adjacent SNPs that show little recombination are called haplotype blocks. • Mean haplotype block length is tens of kb. • Hap. Map project started examining 270 individuals from 4 ethnic groups. • Now expanding to a more comprehensive sample. Characterization of haplotype blocks means that fewer SNPs will need to be typed. 500, 000 SNPs will identify 90% of haplotype blocks.

Hap. Map Glossary • LD (linkage disequilibrium): For a pair of SNP alleles, it’s

Hap. Map Glossary • LD (linkage disequilibrium): For a pair of SNP alleles, it’s a measure of deviation from random association (i. e. , a measure of lack of recombination). Measured by D’, r 2, LOD • Phased haplotypes: Estimated distribution of SNP alleles. Alleles transmitted from Mom are in same chromosome haplotype, while Dad’s form the paternal haplotype. • Tag SNPs: Minimum SNP set to identify a haplotype. r 2= 1 indicates two SNPs are redundant, so each one perfectly “tags” the other.

Hap. Map Project Phase 1 Phase 2 Phase 3 Samples & POP panels 269

Hap. Map Project Phase 1 Phase 2 Phase 3 Samples & POP panels 269 samples (4 panels) 270 samples (4 panels) 1, 115 samples (11 panels) Genotyp ing centers Hap. Map Internati onal Consorti um Perlegen Broad & Sanger Unique QC+ 1. 1 M 3. 8 M (phase 1. 6 M (Affy 6. 0

Phase 3 Samples * Population is made of family trios

Phase 3 Samples * Population is made of family trios

SNP databases • db. SNP (NCBI) – 12 million human SNPs – 5 million

SNP databases • db. SNP (NCBI) – 12 million human SNPs – 5 million validated SNPs – http: //www. ncbi. nlm. nih. gov/SNP/get_html. cgi? which. Html=overview • SNP frequency information • Mapped to the current genome build • Hap. Map (haplotypes)

How to use markers to find disease? genome-wide, dense SNP marker map • problem:

How to use markers to find disease? genome-wide, dense SNP marker map • problem: genotyping cost precludes using millions of markers simultaneously for an association study • question: how to select from all available markers a subset that captures most mapping information (marker selection, marker prioritization) • depends on the patterns of allelic association (haplotypes) in the human genome

The promise for medical genetics CACTACCGA CACGACTAT TTGGCGTAT • within blocks a small number

The promise for medical genetics CACTACCGA CACGACTAT TTGGCGTAT • within blocks a small number of SNPs are sufficient to distinguish the few common haplotypes significant marker reduction is possible chromosome blocks • if the block structure is a general feature of human variation structure, whole-genome association studies will be possible at a reduced genotyping cost • this motivated the Hap. Map project Gibbs et al. Nature 2003

The promise for medical genetics • Discover genes contributing to complex diseases • Use

The promise for medical genetics • Discover genes contributing to complex diseases • Use these markers to test for inherited disease risk • Find SNPs associated with drug side effects • Make drugs safer. • Rescue drugs abandoned due to significant side effects.

Pathway of Drug Development • Lead or Target (Clinical Candidate) • Animal Model Testing

Pathway of Drug Development • Lead or Target (Clinical Candidate) • Animal Model Testing – Toxicity, Efficacy • Phase I Pre-Clinical (toxicity) • Phase II (efficacy) • Phase III (efficacy) • NDA (new drug application) • $100 M 2000 • $0. 5 M 100 • $0. 5 M • $50 M 20 3 2 1

Why pharmacogenomics? • Where do you find the next profitable drug? – The 19/20

Why pharmacogenomics? • Where do you find the next profitable drug? – The 19/20 drugs that failed AFTER phase 1, but are still efficacious! • How do you decrease the cost of clinical trials? – Don’t enroll people of the “wrong” genotype! • Only give drugs to patients likely to benefit and at a low genetic risk of side effects!