BI 420 Introduction to Bioinformatics Sequencing Informatics Gabor
BI 420 – Introduction to Bioinformatics Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc. edu
The nuclear genome (chromosomes)
The genome sequence • the primary template on which to outline functional features of our genetic code (genes, regulatory elements, secondary structure, tertiary structure, etc. )
Completed genomes ~3, 000 Mb >100 Mb ~1 Mb
Main genome sequencing strategies Clone-based shotgun sequencing Human Genome Project Whole-genome shotgun sequencing Celera Genomics, Inc.
Hierarchical genome sequencing BAC library construction clone mapping shotgun subclone library construction sequencing Lander et al. Nature 2001 sequence reconstruction (sequence assembly)
Clone mapping – “sequence ready” map
Hierarchical genome sequencing BAC library construction clone mapping shotgun subclone library construction sequencing/read processing Lander et al. Nature 2001 sequence reconstruction (sequence assembly)
Shotgun subclone library construction BAC primary clone cloning vector subclone insert sequencing vector
Hierarchical genome sequencing BAC library construction clone mapping shotgun subclone library construction sequencing/read processing Lander et al. Nature 2001 sequence reconstruction (sequence assembly)
Sequencing
Robotic automation Lander et al. Nature 2001
Base calling GGGCTCAGCTGTATCAGCCACGTGCCTACAACAATCTGCCCCT
Base calling PHRED base = A Q = 40
Vector clipping
Hierarchical genome sequencing BAC library construction clone mapping shotgun subclone library construction sequencing/read processing Lander et al. Nature 2001 sequence reconstruction (sequence assembly)
Sequence assembly PHRAP
Repetitive DNA may confuse assembly
Sequence completion (finishing) gap region of low sequence coverage and/or quality CONSED, AUTOFINISH
New sequencing technologies From familiar ABI traces … 100 x 1, 000 bp … to 454 pyrograms … 100 thousand x 100 bp … and Solexa reads. 50 million x 20 bp
- Slides: 20