Grid QTL High Performance QTL analysis via the

  • Slides: 23
Download presentation
Grid. QTL High Performance QTL analysis via the Grid/Cloud

Grid. QTL High Performance QTL analysis via the Grid/Cloud

Grid. QTL • BBSRC funded • 5 Years initially, then 3 years (£ 1.

Grid. QTL • BBSRC funded • 5 Years initially, then 3 years (£ 1. 5 M, then £ 750 K) • Part of "Integrative Biology" vision - "Allow prediction from gene sequence to consequence” • Institute of Evolutionary Biology (IEB), Edinburgh University • Roslin Institute, Edinburgh • National e-Science Centre, Edinburgh • EPCC, Edinburgh • Information Services, Edinburgh University

QTLs • Quantitative Trait Loci • Positions along a chromosome that have an influence

QTLs • Quantitative Trait Loci • Positions along a chromosome that have an influence on a continuously varying physical trait. • Traits (Phenotypes) • Weight, Height, Eye Colour, Hypertension, Cancer. . . • Influenced by many loci and environmental factors "Multifactorial". • NOT looking for single position effects • 70% Cystic Fibrosis cases. • Huntington's Disease.

Genomic Data • Look at structure of chromosome pair. • Discover positions that differ

Genomic Data • Look at structure of chromosome pair. • Discover positions that differ from norm. • Locate alleles • SNPs. • Deletions. • Insertions.

Phenotypic Data • Keep record of the trait for each sample. • Roslin Institute

Phenotypic Data • Keep record of the trait for each sample. • Roslin Institute uses Pigs. • Easy to create pedigrees. • Similar genome to humans. • Many studies in short time.

Statistical Process • Genetic information mixed during reproduction (Meiosis). • Positions close on chromosome

Statistical Process • Genetic information mixed during reproduction (Meiosis). • Positions close on chromosome tend to be crossed together. • A statistical process that needs mathematical modelling.

QTLs - Calculation • Genomic Data • Known markers on chromosomes or other regions.

QTLs - Calculation • Genomic Data • Known markers on chromosomes or other regions. • determine alleles (variants) of these markers. • Phenotypic Data • variation of trait data over pedigrees recorded. • Pedigree Data • Build up pedigrees to model inheritance of chosen markers and their variants. • which pedigree can best identify QTLs?

QTLExpress • 2001 • • Web tool using Java servlets evolved from Fortran applications

QTLExpress • 2001 • • Web tool using Java servlets evolved from Fortran applications Simple statistical models employed. Data sets of size KBytes Running time minutes on 2 GHz Pentium

Grid. QTL • Ramp up in Data Size and Processing Time • Data sets

Grid. QTL • Ramp up in Data Size and Processing Time • Data sets MBytes • Processing times hours/days on 2 GHz Intel Pentium • More users expected • More advanced models. • Variance, principal, independent components analyses. • Bayesian statistics. • Random Walk Monte Carlo (MCMC). • So more computing resources • Clusters - UK National Grid Service, ECDF • HPC - investigate parallelism and optimisation of algorithms. • Hector

Complex QTL models • Need more complex models that need more data so that:

Complex QTL models • Need more complex models that need more data so that: • Effect of QTL interactions can be modelled. • Epistasis - how genes interact • Effect of QTL on more than one trait. • Pleiotropy • Managing data from DNA chips (many markers and traits at once. • e. QTL • Fine mapping of QTL loci. • Linkage Disequilibrium (LD) • Variance Component Analysis (VCA)

Grid. QTL • • Local machine– tomcat web server Portal Technologies – Grid. Sphere

Grid. QTL • • Local machine– tomcat web server Portal Technologies – Grid. Sphere Grid – NGS and ECDF Grid middleware (globus) • Now qsub • Digital Certificates - authentication • Now ssh key pair

EPCC • Sub contract programming work • • • General system programming Queuing system

EPCC • Sub contract programming work • • • General system programming Queuing system for local and grid jobs Portal work Memory and parallel issues Cloud work

Usage • Released Autumn 2006 • • • 50 users use portal in a

Usage • Released Autumn 2006 • • • 50 users use portal in a month. 40 analyses/day local server. 4 cpu hours/day local server. 50 analyses/day Grid. 40 cpu hours/day on Grid. • 500 users and 70 citations summer 2012.

Demo

Demo

Demo

Demo

Demo

Demo

Demo

Demo

Demo

Demo

Demo

Demo

User Count

User Count

Analyses & CPU count

Analyses & CPU count

User Studies • User Projects • • Sheep – birth weight, milk & fleece

User Studies • User Projects • • Sheep – birth weight, milk & fleece quality Cattle, Sheep, Pigs & Chickens – growth, quality Horses – airway obstructions for racehorses Fish harvest traits Crocodiles – scale quality Eucalyptus Trees – wood quality Mouse – obesity Foxes - domesticity

Cloud. QTL • Solution to long term sustainability of service. • No infrastructure cost.

Cloud. QTL • Solution to long term sustainability of service. • No infrastructure cost. • Guaranteed analyse in time. • Pay as you go model. • Google, Microsoft, Amazon offer routes. • Amazon preferred. • EPCC route to ECDF