FIMM Institiute for Molecular Medicine Finland www fimm

  • Slides: 29
Download presentation
© FIMM - Institiute for Molecular Medicine Finland www. fimm. fi

© FIMM - Institiute for Molecular Medicine Finland www. fimm. fi

RNA-seq analysis Dr. Tech. Daniel Nicorici FIMM – Institute for Molecular Medicine Finland CSC

RNA-seq analysis Dr. Tech. Daniel Nicorici FIMM – Institute for Molecular Medicine Finland CSC - June 2, 2010 © FIMM - Institiute for Molecular Medicine Finland www. fimm. fi

Outline › RNA sequencing overview › Finding fusion genes › Alternative splicing › Conclusions

Outline › RNA sequencing overview › Finding fusion genes › Alternative splicing › Conclusions www. fimm. fi 3

RNA-seq › high-throughput sequencing technology for sequencing RNAs (actually c. DNAs which contain the

RNA-seq › high-throughput sequencing technology for sequencing RNAs (actually c. DNAs which contain the RNAs' content) › invaluable tool for study of diseases like cancer › allows researchers to obtain information like: § § § gene/transcript/exon expressions alternative splicing gene fusions post-transcriptional mutations single nucleotide variations … www. fimm. fi 4

RNA-seq - cont’d › It reduces greatly the variability between experiments compared to other

RNA-seq - cont’d › It reduces greatly the variability between experiments compared to other established measurement technologies like microarrays, exon arrays, etc. › Due to the small size of the read (c. DNA is fragmented before sequencing) the bioinformatics analysis is challenging, e. g. § de novo assembly § aligning of sequenced reads § computation of gene/transcript/exon expressions www. fimm. fi 5

Reads in RNA-seq 5’ end 3’ end adaptor This is sequenced (short reads) Fig.

Reads in RNA-seq 5’ end 3’ end adaptor This is sequenced (short reads) Fig. 1 – Adaptor and reads in RNA-seq www. fimm. fi 6

Reads in RNA-seq – cont’d Exon A Exon B Exon C Exon D chromosome

Reads in RNA-seq – cont’d Exon A Exon B Exon C Exon D chromosome ? transcript ? ? ? Exon A Exon B ? ? Exon C Exon D Fig. 2 – Reads’ mappings at chromosome and transcript level www. fimm. fi 7

Why RNA-seq? RNA-seq Exon array ~700€/sample (alternative splicing) ~1000€/sample - exon/transcripts expressions - gene

Why RNA-seq? RNA-seq Exon array ~700€/sample (alternative splicing) ~1000€/sample - exon/transcripts expressions - gene expressions - alternative splicing events - SNPs - fusion genes -. . . c. DNA array ~600€/sample SNPs array ~400€/sample Exon array ~700€/sample (fusion genes) Fig. 3 – RNA-seq vs array technologies www. fimm. fi 8

General steps of RNA-seq analysis 1. Filtering of short reads 2. Aligning the reads

General steps of RNA-seq analysis 1. Filtering of short reads 2. Aligning the reads against a reference 3. Computationaly analysing of reads’ alignments 1. 2. 3. 4. compute the gene/transcript/exon expressions find new/known alternative splicing events find new/known fusion genes find new/known SNPs 4. Visualization www. fimm. fi 9

Examples of RNA-seq visualization Fig. 4 – Visualization using Map. View www. fimm. fi

Examples of RNA-seq visualization Fig. 4 – Visualization using Map. View www. fimm. fi 10

Examples of RNA-seq visualization – cont’d Fig. 5 – Coverage plot www. fimm. fi

Examples of RNA-seq visualization – cont’d Fig. 5 – Coverage plot www. fimm. fi 11

Examples of RNA-seq visualization – cont’d Normalized coverage 130. 71 Coverage plot for gene

Examples of RNA-seq visualization – cont’d Normalized coverage 130. 71 Coverage plot for gene ERBB 2 in breast cancer 0. 00 Normalized coverage 4. 41 Coverage plot for gene ERBB 2 in normal breast 0. 00 Fig. 6 – Coverage plots visualization www. fimm. fi 12

Examples of RNA-seq visualization – cont’d Fig. 7 – Visualization of reads’ mappings using

Examples of RNA-seq visualization – cont’d Fig. 7 – Visualization of reads’ mappings using the UCSC browser www. fimm. fi 13

Examples of RNA-seq visualization – cont’d Fig. 8 – Visualization of coverages using UCSC

Examples of RNA-seq visualization – cont’d Fig. 8 – Visualization of coverages using UCSC browser www. fimm. fi 14

Examples of RNA-seq visualization – cont’d Fig. 9 – ”Gel-like” visualization of coverages using

Examples of RNA-seq visualization – cont’d Fig. 9 – ”Gel-like” visualization of coverages using UCSC browser www. fimm. fi 15

Examples of RNA-seq visualization – cont’d Fig. 10 – Histogram of distances between the

Examples of RNA-seq visualization – cont’d Fig. 10 – Histogram of distances between the paired-end reads www. fimm. fi 16

Examples of RNA-seq visualization – cont’d Fig. 11 – Visualization of candidate fusion genes

Examples of RNA-seq visualization – cont’d Fig. 11 – Visualization of candidate fusion genes www. fimm. fi 17

Finding fusion genes Steps: 1. Reads filtering (quality, B’s, etc. ) 2. Align all

Finding fusion genes Steps: 1. Reads filtering (quality, B’s, etc. ) 2. Align all reads on genome 3. Aligning against the transcriptome all the reads which § § map uniquely on genome, or do not map on genome 4. Find the candiates fusion-genes by looking for paired-end reads which map simultaneusly on two different transcripts from two different genes 5. Find the fusion junction (e. g. generating exon-exon combinations and find on which one the reads are aligning) 6. Filtering of candidate fusion-genes www. fimm. fi 18

Reads in RNA-seq – cont’d Exon A Exon B Exon C Exon D chromosome

Reads in RNA-seq – cont’d Exon A Exon B Exon C Exon D chromosome ? transcript ? ? ? Exon A Exon B ? ? Exon C Exon D Fig. 2 – Reads’ mappings at chromosome and transcript level www. fimm. fi 19

Finding fusion genes – cont’d › RNA-seq data for the leukemia K 562 cell

Finding fusion genes – cont’d › RNA-seq data for the leukemia K 562 cell line [1] § Philadelphia chromosome with the known BCR-ABL fusion genes § ~15 000 candidate fusion-genes found § ~85% candidate fusion-genes are known paralogs or have no protein product!!! § 15 candidate fusion-genes are found after additional filtering of candidate fusion-genes where the known BCR-ABL is number one candidate › Filtering of candidate fusion-genes is highly necessary in order to reduce the large number of candidate fusion-genes (from ten of thousands to tens)!!! www. fimm. fi 20

Alternative splicing › process by which the gene’s exons are pieced together in multiple

Alternative splicing › process by which the gene’s exons are pieced together in multiple ways forming m. RNA during the RNA splicing. › there is a large body of evidence showing the links between alternative splicing and different diseases like cancer › Shannon’s entropy from information theory has been used previously for finding the imbalance in transcript expression [2, 3] › Jensen-Shannon divergence has been used in quantifying the relative changes in expression of transcripts [4] › MDL [5] can be used for measuring the relative changes in expression of transcripts too www. fimm. fi 21

Alternative splicing – cont’d Steps: 1. Reads filtering (quality, B’s, etc. ) 2. Align

Alternative splicing – cont’d Steps: 1. Reads filtering (quality, B’s, etc. ) 2. Align all reads on genome 3. Aligning against the transcriptome all the reads which § § map uniquely on genome, or do not map on genome 4. Compute (normalized) transcript expressions (e. g. RPKM) 5. Repeat steps 1 -4 for all samples 6. Find relative-changes/imbalances between their transcript expressions of the same gene across the group of samples www. fimm. fi 22

Alternative splicing – cont’d Table 1 – Example of a gene with its five

Alternative splicing – cont’d Table 1 – Example of a gene with its five transcripts Transcript of gene ”G” Sample ”A” Sample ”B” Transcript 1 3 1 Transcript 2 5 7 Transcript 3 4 2 Transcript 4 4 6 Transcript 5 2 3 www. fimm. fi 23

Alternative splicing – cont’d › Computing the imbalance of transcript expression for example from

Alternative splicing – cont’d › Computing the imbalance of transcript expression for example from Table 1 using MDL method [5]: Transcript of gene ”G” Sample ”A” Sample ”B” Transcript 1 3 1 Transcript 2 5 7 Transcript 3 4 2 Transcript 4 4 6 Transcript 5 2 3 › MDL’s advantage: the criteria for deciding between balanced/imbalanced is built-in www. fimm. fi 24

Alternative splicing – cont’d › only the transcripts which are validated (e. g. there

Alternative splicing – cont’d › only the transcripts which are validated (e. g. there are reads which map only on the given transcript [3]) are used for finding the imbalances › for example in a prostate cancer control sample versus treated sample are found ~3500 alternatively spliced genes www. fimm. fi 25

Conclusions › RNA-seq data analysis: § is computational intensive (when compared to, for example,

Conclusions › RNA-seq data analysis: § is computational intensive (when compared to, for example, microarray analysis) § needs very good filtering criteria, which are based on biology mathematics, in order to improve the quality of the results (i. e. low number of false positives) § there is not only one established way of doing it § many tools used for analysis, e. g. aligners, samtools, etc. , are still work in progress › Visualization: § multiple facets, i. e. read coverage, fusion genes, etc. § depends on the user profile: 1. biologist/medical doctor 2. bioinformatician www. fimm. fi 26

References 1. Berger M. et al. , Integrative analysis of the melanoma transcriptome, Genome

References 1. Berger M. et al. , Integrative analysis of the melanoma transcriptome, Genome Research, Feb. 2010. 2. Ritchie W. et al. , Entropy measures quantify global splicing disorders in cancer, PLOS Computational Biology, vol. 4, March 2008. 3. Gan Q. et al. , Dynamic regulation of alternative splicing and chromatin structure in Drosophila gonads revealed by RNA-seq, Cell Research, May 2010. 4. Trapnell C. et al. , Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation, Nature Biotechnology, vol. 28, May 2010. 5. P. Grunwald, “Minimum description length principle tutorial”, in Advances in Minimum Description Length: Theory and Applications, P. Grunwald, I. J. Myung, and M. Pitt, Eds. , pp. 22 -79. MIT Press, Cambridge, 2005. www. fimm. fi 27

Acknowledgements › Olli Kallioniemi › Janna Saarela › Henrik Edgren › Astrid Murumägi ›

Acknowledgements › Olli Kallioniemi › Janna Saarela › Henrik Edgren › Astrid Murumägi › Sara Kangaspeska › Pekka Ellonen www. fimm. fi 28

› Thank you! www. fimm. fi 29

› Thank you! www. fimm. fi 29