Inferring Transcriptional Regulation Using Transctiptomics Carsten O Daub
Inferring Transcriptional Regulation Using Transctiptomics Carsten O. Daub September 1 st, 2014 Strat. Can Summer School 2014 Vår Gård, Saltsjöbaden
Overview – Levels of Regulation • Genome – SNP – DNA modifications (e. g. methylation) – structural alterations (e. g. genomic rearrangements) • Transcriptome – – – Transcription factors, enhancers/ insulators Promoter RNA splicing mi. RNA Posttranscriptional modifications (e. g. RNA editing) 3 D structure of the genome • Protein – Translation – Posttranslational modifications • Metabolites
Central Dogma of Molecular Biology DNA Transcription RNA Translation Protein Francis Crick, 1958 Non coding RNA
What is the transcriptome? • The ensemble of all expressed RNA • Protein coding genes • Non-protein coding genes
How is the Transcriptome regulated? • Via Promoter – Transcription factors – enhancers – insulators • RNA splicing • mi. RNA • Posttranscriptional modifications (e. g. RNA editing) • 3 D structure of the genome
Regulation via the Promoter
Transcription • The principle: DNA is copied into RNA by the RNA polymerase (Pol) 5’ Pol 3’ • Transcription initiation is more complex in eukaryotes than in prokaryotes • In eukaryotes several different factors are necessary for the transcription of an RNA polymerase II promoter.
http: //en. wikipedia. org/wiki/Gene
• Initiation – Promoter clearance – Pol 2 stalling • Elongation • Termination Figures from http: //en. wikipedia. org/wiki/Transcription_(genetics)
Transcription Model 5’ Pol 3’ Transcription Pre-m. RNA (precursor) Capping ( ) Splicing Polyadenylation m. RNA AAAAAA
Transcription Factor (TF) Binding • TFs bind to specific sites in the DNA • Sets of TFs can function as cisregulatory modules (CRM) Nature Reviews Genetics 5, 276 -287 (April 2004)
Specific TF Binding • Transcription factors bind to specific DNA sequences • Databases of TF binding sequence motifs – JASPAR, TRANSFAC IRF 8 binding motif DNA IRF 8
Promoter Region Transcription start site (TSS) Distal promoter [-10 k, -250] Proximal promoter [-250, -34] Core promoter [-34, -1]
Promoter Region • Core promoter – the minimal portion of the promoter required to properly initiate transcription – – Transcription Start Site (TSS) Approximately -34 A binding site for RNA polymerase General transcription factor binding sites • Proximal promoter – the proximal sequence upstream of the gene that tends to contain primary regulatory elements – Approximately -250 – Specific transcription factor binding sites • Distal promoter – the distal sequence upstream of the gene that may contain additional regulatory elements, often with a weaker influence than the proximal promoter – Anything further upstream (but not an enhancer or other regulatory region whose influence is positional/orientation independent) – Specific transcription factor binding sites
Transcription in eukaryotes • In eukaryotes, several different factors are necessary for the transcription of an RNA polymerase II promoter. Name Location RNA transcribed RNA Polymerase I nucleolus ribosomal RNA (r. RNA) RNA Polymerase II nucleus messenger RNA (m. RNA) and most small nuclear RNAs (sn. RNAs) RNA Polymerase III nucleus transfer RNA (t. RNA) and other small RNAs
Identifying the TF regulators • How much is a TF binding site used – Observed expression of all genes – Predicted site count • Motif Activity Response Analysis (MARA)
FANTOM 4 – A Systems Approach Monoblast-like THP-1 cells were stimulated by PMA to differentiate them into monocyte-like cells. 10 time point samples were collected during differentiation. Monocyte-like Monoblast-like 0 1 PMA Replicates 2 4 6 12 24 48 72 96 hour Microarray check Deep CAGE RIKEN 1 RIKEN 3 RIKEN 5 RIKEN 6 TF q. RT-PCR Not good Illumina (47 K probes) 10 time points mi. RNA microarray
Cap Analysis of Gene Expression (CAGE) CAGE library preparation CAGE data digital processing Sequencing Figure based on [1] Tag cluster (TC) 1 Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nature genetics 38, 626– 35 (2006)
CAGE identifies the active set of promoters Alternative promoter usage for PTPN 6 He. La Promoter THP-1 Promoter Slide modified from Alistair Forrest. Kanamori-Katayama, Itoh, Kawaji et al. 2011 Genome Research. “Unamplified cap analysis of gene expression on a single-molecule sequencer”
Transcriptional Regulation A. TFBS prediction B. Co-expression TF A ×: Average expression CAGE tags No of CAGE tags In each promoter CAGE Promoter Gene B Gene C ● ● ● ◆ ● × ■ × ● ◆ ×■ ◆ ● × ×■ ● × ■ ■ ■ ◆ ◆ ◆ ×■ Gene D 0 h TFBS prediction A: basis: TFBS prediction B: co-expression ◆ × 96 h Co-expression = Total score TF A promoter B High TF A promoter C High TF A promoter D Low
Motif Activity Response Analysis – MARA eps Genome Promoter 1 m 1 m 2 m 3 Promoter 2 ・・・・ Promoter. X m 1 m 4 m 5 Expression Reaction efficiency • Number of possible binding sites Effective THP-1 cells are a monoblastic leukemia cell • Degree of conservation of line the which motif upon PMA treatment can differentiate into an concentration + + adherent monocyte like cellstatus (CD 14 , CSF 1 R ) • Chromatin Suzuki, Forrest, van Nimwegen et al. Nature Genetics 2009, 41: 5
Motif Activity Response Analysis • How much is a binding site used – Observed expression of all promoters over time – Predicted site count Suzuki, Forrest, van Nimwegen et al. Nature Genetics 2009, 41: 5
Nat. Genet. 2009 May; 41(5): 553 -62. Nat
Enhancers • Enhancers are sequence motifs • They bind factors (proteins) that are participating in the transcription initiation complex • Enhancers can be many kb away from the TSS • Insulators are acting in a similar way, but repressing expression • Is an enhancer a gene?
Enhancer RNA • ENCODE reported (Nature, 489(7414), 101– 108) – Enhancers identified by co-occurrence of H 3 K 27 ac and H 3 K 4 me 1 Ch. IP-Seq data, centred on P 300 binding sites, in He. La cells • Enhancers make non-coding RNA Nature 465, 173– 174 (2010). • Widespread transcription at neuronal activity-regulated enhancers. (Kim, T. K. et al. Widespread transcription at neuronal activityregulated enhancers. Nature 465, 182– 187 (2010). )
Djebali, S. , Davis, C. A. , Merkel, A. , Dobin, A. , Lassmann, T. , Mortazavi, A. , et al. (2012). Landscape of transcription in human cells. Nature, 489(7414), 101– 108. doi: 10. 1038/nature 11233
RNA splicing in cancer http: //en. wikipedia. org/wiki/RNA_splicing
Example: Melanoma Transcriptome • discovery of aberrations that contribute to carcinogenesis • characterize the spectrum of cancerassociated m. RNA alterations through integration of transcriptomic and structural genomic data – 11 novel melanoma gene fusions produced by underlying genomic rearrangements – 12 novel readthrough transcripts Genome Res. 2010 Apr; 20(4): 413 -27
Melanoma Transcriptome: Gene Fusion Connecting genes located on different chromosomes!
Melanoma Transcriptome: Gene Read-through
• Genes fusions are ‘private’ – The same gene fusion was not observed in two melanoma patients (10 samples total) • Gene fusions in melanoma might not be the cancer causing events but consequences
Chromosome Structure Ref: http: //www. sequentiabiotech. com/
http: //en. wikipedia. org/wiki/Chromosome_conformation_capture
Mouse ES cells Dixon, J. R. , Selvaraj, S. , Yue, F. , Kim, A. , Li, Y. , Shen, Y. , et al. (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. doi: 10. 1038/nature 11082
• Remote ER-a chromatin biding sites are anchored at gene promoters through long-range chromatin interactions • suggesting that ER-a functions by extensive chromatin looping to bring genes together for coordinated transcriptional regulation Nature. 2009 Nov 5; 462(7269): 58 -64
Polymerase II Stalling stalled active No binding Nature Genetics 39, 1512 - 1516 (2007) • Pol II Ch. IP-chip in drosophila embryos • Stalled genes are highly enriched in developmental control genes
Transcriptional Regulation in Cancer
From observations to mechanisms • Observations => Biomarkers
- Slides: 45