Maize Production Sequencing lfultonwatson wustl edu Maize Production

  • Slides: 24
Download presentation
Maize Production Sequencing lfulton@watson. wustl. edu

Maize Production Sequencing lfulton@watson. wustl. edu

Maize Production Goals v BAC End Sequencing of 220, 000 Clones v Fosmid End

Maize Production Goals v BAC End Sequencing of 220, 000 Clones v Fosmid End Sequencing of 500, 000 Clones v Shotgun of 16, 000 BAC Clones

Maize BAC End Sequences v 580, 000 reads processed v 567 average read length

Maize BAC End Sequences v 580, 000 reads processed v 567 average read length v 60% success

Maize Fosmid End Sequences v 850, 000 processed v 79% success v 543 average

Maize Fosmid End Sequences v 850, 000 processed v 79% success v 543 average read length v Completed today

Library Construction Pipeline v Receipt of sheared DNA from AGI v Size selection of

Library Construction Pipeline v Receipt of sheared DNA from AGI v Size selection of insert DNA v Ligation into p. SMART vector

v Constructed 17, 034 Libraries as of August 31 st

v Constructed 17, 034 Libraries as of August 31 st

Average Fail Rate for Library Construction was less than 5%

Average Fail Rate for Library Construction was less than 5%

Shotgun Criteria v 3. 5 X coverage v Clone size verification v 50% paired

Shotgun Criteria v 3. 5 X coverage v Clone size verification v 50% paired ends v BES agreement v 25% of clones failed v 22% need more data v 3% BES disagreement

Shotgun Complete for 12, 211 Clones as of August 31 st

Shotgun Complete for 12, 211 Clones as of August 31 st

Final Production Work v 660 Clones Need Library Construction v 2100 Clones In Production

Final Production Work v 660 Clones Need Library Construction v 2100 Clones In Production Pipeline v Expected Completion Date December 2007

Sequence Improvement Bob Fulton Dick Mc. Combie Rod Wing

Sequence Improvement Bob Fulton Dick Mc. Combie Rod Wing

Sequence Improvement Pipeline • Shotgun_done triggers the prefinishing pipeline • Initial identification of “do

Sequence Improvement Pipeline • Shotgun_done triggers the prefinishing pipeline • Initial identification of “do finish” regions • Manual sorting and use of autoedit(Gordon) to break apart misassembly. • Autofinish(Gordon) used to choose directed reactions for all gaps and regions of low quality in “do finish” regions • Reassembly and 2 nd iteration of prefinishing pipeline • Final identification of “do finish” regions and handoff to finishing pipeline

Clone Improvement through the Prefinishing Pipeline

Clone Improvement through the Prefinishing Pipeline

Assembly View-Entire Clone Coverage (green) Spanning Plasmids End

Assembly View-Entire Clone Coverage (green) Spanning Plasmids End

Assembly View-Do Finish Region EST sequence GSS sequence Do Finish Repeat Tags

Assembly View-Do Finish Region EST sequence GSS sequence Do Finish Repeat Tags

Alignment with c. DNA read pairs Alignment with End Sequences

Alignment with c. DNA read pairs Alignment with End Sequences

Actual Projected

Actual Projected

Maize Gen. Bank Submissions Joanne Nelson

Maize Gen. Bank Submissions Joanne Nelson

Submission Landmarks HTGS_FULLTOP HTGS_PREFIN HTGS_ACTIVEFIN HTGS_IMPROVED

Submission Landmarks HTGS_FULLTOP HTGS_PREFIN HTGS_ACTIVEFIN HTGS_IMPROVED

Improved Sequence “Non-repetitve portions of the sequence have had sequence improvement (directed attempts) and

Improved Sequence “Non-repetitve portions of the sequence have had sequence improvement (directed attempts) and have been labeled as ‘improved. ’ Improved regions are double stranded, sequenced with an alternate chemistry or covered by high quality data (i. e. phred quality greater than or equal to 30 or approval by an experienced finisher), unless otherwise noted. Regions of low sequence complexity (such as dinucleotide repeats and small unit tandem repeats) in the improved regions have not been resolved to previously established finishing standards. BAC end sequence, cot and methyl filtered genome survey sequence and data from overlapping projects of strain B 73 may have been included in this project. Where possible, contigs have been ordered and oriented based on read pairing. These regions are designated as scaffolds. Additional order and orientation will be provided upon completion of detailed analysis of the complete finished tiling path. ”

Improved Sequence FEATURES source Location/Qualifiers 1. . 184604 /organism="Zea mays" /mol_type="genomic DNA" /db_xref="taxon: 4577"

Improved Sequence FEATURES source Location/Qualifiers 1. . 184604 /organism="Zea mays" /mol_type="genomic DNA" /db_xref="taxon: 4577" /chromosome="1" /clone="CH 201 -132 J 17; ZMMBBc 0132 J 17" misc_feature 1. . 69252 /note="scaffold_name: Scaffold 1" misc_feature 1. . 34245 /note="assembly_name: Contig 28 vector_side: SP 6" misc_feature 32401. . 34245 /note="Improved sequence. " unsure 34230. . 34245 /note="Non-repetitive but unresolved region" gap 34246. . 34345 /estimated_length=unknown misc_feature 34346. . 68071 /note="assembly_name: Contig 27" misc_feature 34346. . 36695 /note="Improved sequence. " unsure 34346. . 34356 /note="Non-repetitive but unresolved region" misc_feature 38146. . 46795 /note="Improved sequence. " gap 68072. . 68171 /estimated_length=unknown misc_feature 68172. . 69252 /note="assembly_name: Contig 14" gap 69253. . 69352 /estimated_length=unknown misc_feature 69353. . 132243 /note="scaffold_name: Scaffold 2”

Submission Totals HTGS_FULLTOP HTGS_PREFIN HTGS_ACTIVEFIN HTGS_IMPROVED TOTAL 3342 2014 4151 2660 12167

Submission Totals HTGS_FULLTOP HTGS_PREFIN HTGS_ACTIVEFIN HTGS_IMPROVED TOTAL 3342 2014 4151 2660 12167