ENCODE Data Available through The UCSC Genome Browser
ENCODE Data Available through The UCSC Genome Browser Osvaldo Graña CNIO Bioinformatics Unit Materials prepared by Mary Mangan, Ph. D. Warren C. Lathe, Ph. D. www. openhelix. com Version 3 1
ENCODE DCC at UCSC Introduction ENCODE Data Types Find and Use ENCODE Data ENCODE Downloads Additional ENCODE Topics Exercises ENCODE at UCSC: http: //encodeproject. org Copyright Open. Helix. No use or reproduction without express written consent 2
ENCODE: www. genome. gov/10005107 ENCyclopedia of DNA Elements, NHGRI Consortium of international researchers UCSC is the Data Coordination Center 3
ENCODE Background Pilot phase, or phase I: www. genome. gov/26525202 Selected regions of the genome: 1%, 30 MB 4
ENCODE Discoveries “Marker” papers: Nature and issue of Genome Research Changes to our conceptual framework for the genome 5
ENCODE Pilot Data and Beyond ENCODE portal: http: //genome. ucsc. edu/ENCODE/ Pilot ENCODE browser: genome. ucsc. edu/ENCODE/pilot. html 6
ENCODE Next Phase: Production Phase UCSC is the DCC for human and mouse data The portal is available: genome. ucsc. edu/ENCODE/ New aspects of the Production Phase projects 7
ENCODE Production Phase Focus chromatin transcriptome/ genes promoters/ regulatory sites DNase sites ENCODE is now genome-wide Specific cell types and new technologies being applied Project focus topics selected, then supplemented 8
ENCODE Data is Flowing! Data being submitted to UCSC DCC by data providers “Wranglers” ensure meta data is present Quality checks occur, data is released for use 9
ENCODE DCC at UCSC Introduction ENCODE Data Types Find and Use ENCODE Data ENCODE Downloads Additional ENCODE Topics Exercises ENCODE at UCSC: http: //encodeproject. org 10
ENCODE Data Types ENCODE Tracks identified with icon Mapping data Genes Expression Regulation Variation 11
Mapability Data Broad: Duke: Rosetta: 36 mers 20 -35 mers UMass: 15 mers not unique more unique Mapability for unique regions Higher the peak, the more unique Cleavage intensity for structural profiling 12
GENCODE http: //www. sanger. ac. uk/Post. Genomics/encode/ Gencode for assessment of protein coding genes 13
Expression Data: RNA Localization http: //en. wikipedia. org/wiki/MRNA RNAs molecules, location in various cell types and fractions 14
Expression Data: Presence of RNA or Exons http: //en. wikipedia. org/wiki/MRNA RNAs of various types Special look for long m. RNAs and exons 15
Regulation Data Image from NIH Regulation data Structure: modifications, open vs. closed chromatin 16
Regulation Data II TATA bound to DNA Transcription factor binding sites, TFBS RNA binding proteins 17
Variation Data Copy Number Variation (CNV) Data 18
Super-Tracks New strategies to integrate and display data Super-Tracks provide multiple data types to view See Track Description page for details, options, and keys 19
ENCODE DCC at UCSC Introduction ENCODE Data Types Find and Use ENCODE Data ENCODE Downloads Additional ENCODE Topics Exercises ENCODE at UCSC: http: //encodeproject. org 20
General Organization Configuration choices, options, filters click Tracks identified with icon Also available in Table Browser Description pages have options, settings, filters, display keys, meta data, and references Display key, techniques, references, contacts 21
ENCODE Data Policy genome. ucsc. edu/ENCODE/terms. html Non-scoop window “Ft. Lauderdale agreement” 22
Awareness of Embargo Dates Track description pages, Table Browser interface Download pages 23
Ch. IP-seq Data for TFBS TP 53 cell types + antibodies stronger signals Yale TFBS Sample display near TP 53 in “dense” visibility mode Chip-seq graphic adapted from: wikipedia. org/wiki/Ch. IP-on-chip 24
Description Page, Upper display mode peak configure download See description page for more display options Choose tracks and view styles 25
Description Page, Lower Display conventions explained Methods and references 26
ENCODE DCC at UCSC Introduction ENCODE Data Types Find and Use ENCODE Data ENCODE Downloads Additional ENCODE Topics Exercises ENCODE at UCSC: http: //encodeproject. org 27
Downloads and Release Log Human Mouse Release log for a handy list of available data Download is offered; FTP recommended 28
ENCODE DCC at UCSC Introduction ENCODE Data Types Find and Use ENCODE Data ENCODE Downloads Additional ENCODE Topics Exercises ENCODE at UCSC: http: //encodeproject. org 29
New Features encode-announce mailing list: https: //lists. soe. ucsc. edu/mailman/listinfo/encode-announce UCSC Genome Browser discussion list: http: //genome. ucsc. edu/contacts. html Mouse data Proteomics data Publications Questions? UCSC mailing list, or ENCODE at NHGRI 30
mod. ENCODE: modencode. org ne w February 2011 issue Science 24 December 2010: Vol. 330 A separate mod. ENCODE: www. genome. gov/26524507 C. elegans and D. melanogaster mod. ENCODE DCC: www. modencode. org 31
ENCODE DCC at UCSC Introduction ENCODE Data Types Find and Use ENCODE Data ENCODE Downloads Additional ENCODE Topics Exercises ENCODE at UCSC: http: //encodeproject. org Copyright Open. Helix. No use or reproduction without express written consent 32
Notice: The materials and slides offered are for non-commercial use only. Reproduction, distribution and/or use for commercial purposes is strictly prohibited. Copyright 2010, Open. Helix, LLC http: //www. openhelix. com/ENCODE Copyright Open. Helix. No use or reproduction without express written consent 33
- Slides: 33