Annotation of Mouse Draft Sequence by Comparative Analysis
Annotation of Mouse Draft Sequence by Comparative Analysis to Human Sequence M. Bucan, T. Wiltshire, A. Lengeling, L. Tarantino, S. Kanes CNB, University of Pennsylvania J. Crabtree, J. Schug, C. Overton, C. Stoeckert CBIL, University of Pennsylvania J. Lehoczky, K. Dewar, B. Birren and the Whitehead/MIT Center for Genome Research
The Gabrg 1 -Gabra 2 -Gabrb 1 -Txk-Tec-Gsh 2 -Pdgfra. Kit-Kdr(Flk 1)-Clock BAC contigs on Chr. 5
The Gabrg 1 -Gabra 2 -Gabrb 1 -Txk-Tec-Gsh 2 -Pdgfra. Kit-Kdr(Flk 1)-Clock BAC contigs on Chr. 5 Sequence available
Annotation Overview • Order and orient draft sequence contigs • Perform framework sequence annotation – repeat content, gene content – MARs, Cp. G islands, BAC ends – TF binding sites • Find regions conserved with human • Identify over/under-represented TF sites
Annotation of Tec-Txk draft sequence (65 i 8) Ordering and orienting pieces using known genes
Annotation of Tec-Txk draft sequence (65 i 8) Ordering and orienting pieces using BAC ends
Annotation of Kit draft sequence (232 h 18) Ordering and orienting using known genes & BAC ends
Annotation of Kit draft sequence (232 h 18) Ordering and orienting pieces using conserved regions
Annotation displayed in bio. Widget “Annot. View”
Annotation of Kit draft sequence (232 h 18) Transcription Element Search System analysis
TESS Analysis • Searched entire human and mouse syntenic sequences with all TESS matrices. • Identified binding sites over/underrepresented in the conserved regions. • Conserved sites dispersed over 150 kb. • Over-represented factors include AP 2, Pax 6, S 8, Oct-1, E 2 A, E 2 F-DRTF, TAL 1 /E 47, Cdx. A, Ubx, Abd. B-r, Engrailed, Hairy, DFD
Conclusions • Order & orient up to 87% of draft sequence using genes, BAC ends, conserved regions. • Discovery of 3 novel genes. • TF binding site analysis alone is not informative (see TESS graphs!) – Third organism (chicken? ) – Higher-order patterns (i. e. , co-occurrence)
Future work • Methods for identifying conserved regions – local versus global alignments – favor short/high ident. or long/low ident. ? • Extend TESS TF site analysis using draft human sequence. • Automate annotation/analysis procedure; provide an on-line resource for BAC annotation.
bio. Widget “Annot. View”
Annotation of Tec-Txk draft sequence (65 i 8) Starting material: 23 unordered pieces
- Slides: 18