Analysis of NGS raw data with Galaxy Cleaning


































































- Slides: 66
Analysis of NGS raw data with Galaxy Cleaning, data control, alignment, polymorphism Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Aim of the Tutorial classes: 1 - Galaxy vs Command line 2 - Understand FASTQ files 3 - Cleaning of Illumina data (FASTQ) 4 - Perform an assembly 5 - Perform a mapping of Illumina reads on a reference sequence 6 - Cleaning of a multiple SAM file Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
1 - Galaxy CIRAD Server : http: //gohelle. cirad. fr/galaxy/ CIBA courses – Brasil 2011 Serveur principal: Alexis Dereeper http: //main. g 2. bx. psu. edu/ Alexis Dereeper, François Sabot
TOOLS DATA Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
WEB APPLICATION - “Click'n'Play” system - transparent for user Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks (already integrated) - Adding of customizable bricks Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks (already integrated) - Adding of customizable bricks MULTIPLE - Based on a web server (Apache. . . ) - On a single machine, or a cluster. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks (already integrated) - Adding of customizable bricks MULTIPLE - Based on a web server (Apache. . . ) - On a single machine, or a cluster. . . BUT - Simple support - Much less powerful than terminal - Only for routine analysis - Only for limited data Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
CONNECTION FOR THE TUTORIAL CLASSES: http: //gohelle. cirad. fr/galaxy/ Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Connecting. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Add data. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Import data from Galaxy libraries Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Import data from Galaxy libraries Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
FASTQ file → TEXT file STRUCTURE: @HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb @HWUSI-EAS 454_0006: 1: 37: 16314: 3410#CTTGTA AGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGTGGTGGCCG + `b. Tbbccccceeeeeccc. Yeedded`ceec]dddde^a`deeeec`dddcbaadad. Yd`]]Jc_^bc^^ Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
SEQUENCE NAME @HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
IUPAC SEQUENCE @HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Quality in ASCII Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb f → Quality = 38 (102 – 64) Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
WHAT IS QUALITY ? Quality value Q is an integer mapping of p (i. e. , the probability that the corresponding base call is incorrect). Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
FASTQC: quality control http: //www. bioinformatics. bbsrc. ac. uk/projects/download. html#fastqc Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Why do we need to clean ? To remove remaining adapters/primers and low quality sequences → Cut. Adapt Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
20 70 7 Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Your data are now ready to be analyzed. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Concatenate files Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Untested Tools → NGS → Assembly → Assemble with MIRA Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
BLAST of putative contigs against reference Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
BLAST of putative contigs against reference Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Separate sequences by original individuals RC 1, RC 2. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Separate sequences by original individuals RC 1, RC 2. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Separate sequences by original individuals RC 1, RC 2. . . Use of regular expression via Galaxy: → RC[13456789] & remove reads => keep RC 2 → RC[123456789]_ & remove reads => keep RC 10 Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Separate sequences by original individuals RC 1, RC 2. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference 1 - Compute positions for each read Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference 1 - Compute positions for each read 2 - Associate positions of each member of the pair Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference 1 - Compute positions for each read 2 - Associate positions of each member of the pair 3 - Selection of the more probable position respecting the conditions Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference 1 - Compute positions for each read 2 - Associate positions of each member of the pair 3 - Select of the more probable position respecting the conditions 4 - Edit a SAM output file Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Reference From History: Shared data/Formation/Pre. Process/reference. fasta Library: Paired-end FASTQ files: From your history BWA setting to use: Commonly Used Unselect “Suppress the header in the output SAM file” Click Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
SAM output file (Sequence Alignment/Map) Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Sort of SAM file by coordinate Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Creation of Workflow for automated analysis Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Workflow: how to avoid to run all the process by hand Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot