CSU IDRC Next Generation Sequencing Core Genomic Sequencing

  • Slides: 10
Download presentation
CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Semiconductor DNA Sequencing Ion Proton Ion Torrent “Sequencing on a Chip”

Semiconductor DNA Sequencing Ion Proton Ion Torrent “Sequencing on a Chip”

Semiconductor Sequencing in a Nutshell “It’s a computational p. H meter”

Semiconductor Sequencing in a Nutshell “It’s a computational p. H meter”

Metagenomics • Environmental samples of communities of organisms • water, soil samples • human

Metagenomics • Environmental samples of communities of organisms • water, soil samples • human & animal microbiomes • mine tailings, oil spills • deep sea, polar ice • etc.

Metagenomics Pipeline Torrent/Proton sequencers CSU Cray supercomputer; Oak Ridge Titan supercomputer Megan NCBI nucleotide

Metagenomics Pipeline Torrent/Proton sequencers CSU Cray supercomputer; Oak Ridge Titan supercomputer Megan NCBI nucleotide databases

Metagenomics Tools Ion Proton Sequencer • In: Sample DNA • Out: 50 M DNA

Metagenomics Tools Ion Proton Sequencer • In: Sample DNA • Out: 50 M DNA fragments NCBI nucleotide database • DNA fragments • 15 M+ records Do the math: • 50 M * 15 M = 1014 queries mpi. BLAST • Highly parallelized Blast algorithm • NGS sample DNA • Query NCBI DB CSU Cray XT 6 m • 2, 016 CPU cores

Metagenomics • • • Dr. Toni Piaggio, National Wildlife Research Center, Fort Collins Florida

Metagenomics • • • Dr. Toni Piaggio, National Wildlife Research Center, Fort Collins Florida Everglades water samples (4) “What species are in the water? ” • • CSU Next. Gen Sequencing Core: Ion Proton; 2 weeks CSU Cray: 1, 000 cores, 24 -hours, 4 runs; 1 week • Results

Metagenomics • • Rarefaction curves Estimate species richness Asymptotic? Find rare species

Metagenomics • • Rarefaction curves Estimate species richness Asymptotic? Find rare species

Computational Resources Strong scaling CSU Cray XT 6 m Supercomputer • 2, 016 CPU

Computational Resources Strong scaling CSU Cray XT 6 m Supercomputer • 2, 016 CPU cores • mpi. Blast • NCBI nucleotide DB • Query 1% of sample DNA Oak Ridge Titan Cray XK 7 Supercomputer • 300 K CPU cores; 50 M GPU cores • mpi. Blast • NCBI nucleotide DB • Query 100% of sample DNA

Summary Big Data Issues • Semiconductor sequencer data • Large-scale database queries • High-performance

Summary Big Data Issues • Semiconductor sequencer data • Large-scale database queries • High-performance computing