From Phenotype to Genotype and Back Again Animal
From Phenotype to Genotype and Back Again — Animal Genomics Enabling Prediction Alan L. Archibald The Roslin Institute and Royal (Dick) School of Veterinary Studies University of Edinburgh
Genotype - phenotype • Aim – To predict outcomes • • Efficacy of drug Susceptibility to cancer Performance of daughters of elite dairy bull Susceptibility to nematode infections • Discovery – From phenotype to genotype (gene) • Prediction – From phenotype to genotype (breeding value) – From genotype to phenotype – From sequence to consequence
1953 Watson and Crick 1977 DNA sequenced ΦX 174 5, 386 nt 1990 Human Genome Project launched Animal model, infinitesimal model 1920 s and 30 s Fisher, Lush and others Population Genetics 1970 s + Advances in quantitative analysis 1991 Pi. GMa. P project starts 2001 Draft human genome sequence ‘Halothane’ gene test Marker Assisted Selection (MAS) 1990 s + Quantitative trait locus (QTL) mapping 2001 Genomic selection proposed
Pi. GMa. P – 25 years old € 1. 2 million Linkage / recombination map Physical / cytogenetic map Comparative map c. DNA (ESTs) microsatellites
1953 Watson and Crick 1977 DNA sequenced ΦX 174 5, 386 nt 1990 Human Genome Project launched Animal model, infinitesimal model 1920 s and 30 s Fisher, Lush and others Population Genetics 1970 s + Advances in quantitative analysis 1991 Pi. GMa. P project starts 2001 Draft human genome sequence ‘Halothane’ gene test Marker Assisted Selection (MAS) 1990 s + Quantitative trait locus (QTL) mapping 2001 Genomic selection proposed
1962 2002 Prediction success Selective animal breeding Animal model Phenotypic selection Prediction of breeding value (genotype) from phenotype • Successful EU companies • • • e. g. Aqua. Gen, Aviagen, Cherry Valley, Cogent, CRV, Genus, JSR Genetics, Hendrix-Genetics, Landcatch Natural Selection, Topigs Norsvin 50% more pigs 14 pigs/yr 21 pigs/yr 33% less feed 33% more lean 410 kg 34 kg lean feed / pig 273 kg 45 kg lean feed / pig
Modern intensive agriculture is efficient “Why Industrial Farms Are Good for the Environment” Jayson Lusk, New York Times, 23 Sept 2016
Selection works • Age – matched • Seven rounds of selection per annum • Black box, but…
Successes – from association to causation • DGAT 1 – dairy cattle, milk yield • Callipyge – sheep, muscling • MSTN – sheep, muscling • IGF 2 – pigs, muscling • Noteworthy – Regulatory sequences, epigenetics • One gene at a time: slow, inefficient Knowledge of causation enabled more sophisticated selective breeding
2001 Genomic selection proposed 2002 Mouse draft genome sequence 2004 2003 Chicken Human genome sequence “finished” sequenced $3 billion 2008 Human 1000 Genomes Project 2007 launched Cat genome sequenced 2010 Turkey genome sequenced 2009 Cattle genome sequenced Horse genome Sequenced Mouse genome “finished” 2005 Dog genome sequenced 2003 ENCODE (1%) launched 2008 Bovine 50 K SNP chip 2007 ENCODE genome-wide 2010 2009 Pig 60 K SNP chip 750 K bovine SNP chips Sheep 60 K SNP chip
From Marker-Assisted to Genomic Selection “…This type of approach combined with cheap and high density markers, could allow a move from selection based on a combination of “infinitesimal” effects plus individual loci to effective total genomic selection…. . ”
Genomics already delivering socioeconomic impact in agriculture • Genomic selection (GS) – GS theory developed in 2001 before technology available – First 50 K SNP chip (cattle) 2008; 650 K in 2010 – GS implemented in all major livestock sectors in developed world – GS is underpinning faster, more accurate and sustainable genetic improvement
Accuracy – what has been achieved? USDA dairy cattle genomic evaluation Courtesy of George Wiggans (USDA, Beltsville) Milk yield Pedigree Genomic Accuracy 0. 51 0. 86
Evolution of Genomic selection • GS 0. 0 – The original model – Linkage disequilibrium based • GS 1. 0 – What has happened in practice – Linkage based • GS 2. 0 – The future – LD and QTN based – Requires lots of data Goddard & Hayes Nat. Rev. Genet. 2009
GS accuracy • Accurate really only for close relatives 0. 9 Accuracy 0. 8 0. 7 0. 6 R 2 = 0. 962 0. 5 0. 4 0. 3 0. 2 0. 1 0 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 Mean of the Top Ten Relationships Clark et al. (2012)
From SNPs to sequence • In next five years sequence data will supplant SNP genotypes • Two approaches – Sequencing individuals (e. g. 1000 Bull genomes project) • Expensive even at $1000 per genome • Alternatively genotyping-by-sequencing on new platforms (e. g. Illumina Hi. Seq. X), then impute – Sequencing populations • Aiming for $10 per genome
Multiple (aligned) animal genomes § Pigs • • Groenen (Wageningen) ~300 individual pigs Korean ~60 individual pigs China ? ? Pigs 96 pig exomes (Roslin) § Sheep • 453 genomes in Sheep. Genomes. DB http: //sheepgenomesdb. org/home § Chickens • 10’s of individuals (e. g. 10 individual J line brown egg layers) § Cattle § ~3, 000 genomes (Taylor estimate) § 1000 Bull Genomes Project • • Collaborative, Cloud data repository 1500+ bulls, average coverage ~11 x Data analysis cycles for genomic prediction Next. Gen – >400 sheep, goat, cattle genomes
Sequencing populations • Aim: sequence data for 100 K to 1 M individuals at $10 per individual • Exploits: – pedigree structures in managed population – imputation from low sequence coverage • Assemble shared halpotypes from partial low coverage sequence of 100’s of related individuals
LCSeq for whole genome sequencing • Sequencing few individuals not that useful • Sequence everybody at low-x & impute • Make the population the target not the individual – ~250 K pigs, Genus – ~250 K chickens, Aviagen
Genomic selection • GS theory developed in 2001 before technology available • First 50 K SNP chip (cattle) 2008; 650 K in 2010 • GS implemented in all major livestock sectors in developed world • GS is underpinning faster, more accurate and sustainable genetic improvement • From SNPs to sequence (via imputation) • Adding knowledge of SNP effects – Coding/non-coding; known/predicted
2012 ENCODE 2012 Pig genome sequenced 2012 Chicken 600 K SNP chip 2016 Improved reference genomes – goat, pig, sheep, cattle, chicken, 2013 Goat genome sequenced 2013 Duck genome sequenced 2015 Functional Annotation of Animal Genomes (FAANG) launched 2016 FAANG-Europe COST Action 2014 Sheep genome sequenced 2013 onwards Genotype-bysequence 2014 Salmon SNP chip 2015 Pig 650 K SNP chip 2015 onwards LCseq for genomic selection SNPs impute to sequence Fish: Tilapia, Cod, Salmon, ……
From sequence to consequence Phenome Growth Feed efficiency Body composition Disease resistance Adapted from Ritchie et al. 2015 Nature Reviews Genetics 16: 85
Reference genome improvement • Pac. Bio long read technology, de novo assembly – Goat, pig, sheep, cattle • Sscrofa 10. 2: 73, 500 contigs; contig N 50 ~80 kbp • Sscrofa 11: <200 contigs; contig N 50 ~35 Mbp • Disruptive technology, multiple genome(s) assemblies – Annotation - “Best in genome” – Graph visualization, alignment tools under development
Discovering functional sequences • Evolutionary – Sequence comparison, conservation – 1000 G, G 10 K, … – Genome sequence sufficient – Conserved, but what is it? – Highly variable ≠ nonfunctional • Functional, biochemical – Assay-by-sequence – ENCODE, i. HEC, Epigenome roadmap, FAANG – Expensive – Exploring 4 -demensional space (location + time) – Noise or biologically meaningful?
• 80. 4% participates in at least one biochemical RNA- and/or chromatinassociated event in at least one cell type • promoter functionality can explain most of the variation in RNA expression • SNPs associated with disease by GWAS are enriched within noncoding functional elements >$250 million
Richly annotated reference genomes • A key shared (open access) resource – for 21 st century biological research • For effective exploitation – for genomics enabled prediction • e. g. selective animal breeding • Expensive shared resources – International collaborative consortia
- Slides: 28