Analysis of NGS raw data with Galaxy Cleaning

  • Slides: 66
Download presentation
Analysis of NGS raw data with Galaxy Cleaning, data control, alignment, polymorphism Alexis Dereeper

Analysis of NGS raw data with Galaxy Cleaning, data control, alignment, polymorphism Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Aim of the Tutorial classes: 1 - Galaxy vs Command line 2 - Understand

Aim of the Tutorial classes: 1 - Galaxy vs Command line 2 - Understand FASTQ files 3 - Cleaning of Illumina data (FASTQ) 4 - Perform an assembly 5 - Perform a mapping of Illumina reads on a reference sequence 6 - Cleaning of a multiple SAM file Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

1 - Galaxy CIRAD Server : http: //gohelle. cirad. fr/galaxy/ CIBA courses – Brasil

1 - Galaxy CIRAD Server : http: //gohelle. cirad. fr/galaxy/ CIBA courses – Brasil 2011 Serveur principal: Alexis Dereeper http: //main. g 2. bx. psu. edu/ Alexis Dereeper, François Sabot

TOOLS DATA Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

TOOLS DATA Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

WEB APPLICATION - “Click'n'Play” system - transparent for user Alexis Dereeper CIBA courses –

WEB APPLICATION - “Click'n'Play” system - transparent for user Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks

WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks (already integrated) - Adding of customizable bricks Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks

WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks (already integrated) - Adding of customizable bricks MULTIPLE - Based on a web server (Apache. . . ) - On a single machine, or a cluster. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks

WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks (already integrated) - Adding of customizable bricks MULTIPLE - Based on a web server (Apache. . . ) - On a single machine, or a cluster. . . BUT - Simple support - Much less powerful than terminal - Only for routine analysis - Only for limited data Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

CONNECTION FOR THE TUTORIAL CLASSES: http: //gohelle. cirad. fr/galaxy/ Alexis Dereeper CIBA courses –

CONNECTION FOR THE TUTORIAL CLASSES: http: //gohelle. cirad. fr/galaxy/ Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Connecting. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Connecting. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Add data. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François

Add data. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Import data from Galaxy libraries Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper,

Import data from Galaxy libraries Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Import data from Galaxy libraries Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper,

Import data from Galaxy libraries Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

FASTQ file → TEXT file STRUCTURE: @HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT +

FASTQ file → TEXT file STRUCTURE: @HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb @HWUSI-EAS 454_0006: 1: 37: 16314: 3410#CTTGTA AGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGTGGTGGCCG + `b. Tbbccccceeeeeccc. Yeedded`ceec]dddde^a`deeeec`dddcbaadad. Yd`]]Jc_^bc^^ Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses –

@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

SEQUENCE NAME @HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA

SEQUENCE NAME @HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

IUPAC SEQUENCE @HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA

IUPAC SEQUENCE @HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses –

@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Quality in ASCII Alexis Dereeper

@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Quality in ASCII Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses –

@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb f → Quality = 38

@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb f → Quality = 38 (102 – 64) Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

WHAT IS QUALITY ? Quality value Q is an integer mapping of p (i.

WHAT IS QUALITY ? Quality value Q is an integer mapping of p (i. e. , the probability that the corresponding base call is incorrect). Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

FASTQC: quality control http: //www. bioinformatics. bbsrc. ac. uk/projects/download. html#fastqc Alexis Dereeper CIBA courses

FASTQC: quality control http: //www. bioinformatics. bbsrc. ac. uk/projects/download. html#fastqc Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Why do we need to clean ? To remove remaining adapters/primers and low quality

Why do we need to clean ? To remove remaining adapters/primers and low quality sequences → Cut. Adapt Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

20 70 7 Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

20 70 7 Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Your data are now ready to be analyzed. . . Alexis Dereeper CIBA courses

Your data are now ready to be analyzed. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Concatenate files Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Concatenate files Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Untested Tools → NGS → Assembly → Assemble with MIRA Alexis Dereeper CIBA courses

Untested Tools → NGS → Assembly → Assemble with MIRA Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

BLAST of putative contigs against reference Alexis Dereeper CIBA courses – Brasil 2011 Alexis

BLAST of putative contigs against reference Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

BLAST of putative contigs against reference Alexis Dereeper CIBA courses – Brasil 2011 Alexis

BLAST of putative contigs against reference Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Separate sequences by original individuals RC 1, RC 2. . . Alexis Dereeper CIBA

Separate sequences by original individuals RC 1, RC 2. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Separate sequences by original individuals RC 1, RC 2. . . Alexis Dereeper CIBA

Separate sequences by original individuals RC 1, RC 2. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Separate sequences by original individuals RC 1, RC 2. . . Use of regular

Separate sequences by original individuals RC 1, RC 2. . . Use of regular expression via Galaxy: → RC[13456789] & remove reads => keep RC 2 → RC[123456789]_ & remove reads => keep RC 10 Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Separate sequences by original individuals RC 1, RC 2. . . Alexis Dereeper CIBA

Separate sequences by original individuals RC 1, RC 2. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Mapping: Map 'pair-end‘ reads on a reference 1 - Compute positions for each read

Mapping: Map 'pair-end‘ reads on a reference 1 - Compute positions for each read Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Mapping: Map 'pair-end‘ reads on a reference 1 - Compute positions for each read

Mapping: Map 'pair-end‘ reads on a reference 1 - Compute positions for each read 2 - Associate positions of each member of the pair Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Mapping: Map 'pair-end‘ reads on a reference 1 - Compute positions for each read

Mapping: Map 'pair-end‘ reads on a reference 1 - Compute positions for each read 2 - Associate positions of each member of the pair 3 - Selection of the more probable position respecting the conditions Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Mapping: Map 'pair-end‘ reads on a reference 1 - Compute positions for each read

Mapping: Map 'pair-end‘ reads on a reference 1 - Compute positions for each read 2 - Associate positions of each member of the pair 3 - Select of the more probable position respecting the conditions 4 - Edit a SAM output file Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Reference From History: Shared data/Formation/Pre. Process/reference. fasta Library: Paired-end FASTQ files: From your history

Reference From History: Shared data/Formation/Pre. Process/reference. fasta Library: Paired-end FASTQ files: From your history BWA setting to use: Commonly Used Unselect “Suppress the header in the output SAM file” Click Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

SAM output file (Sequence Alignment/Map) Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper,

SAM output file (Sequence Alignment/Map) Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Sort of SAM file by coordinate Alexis Dereeper CIBA courses – Brasil 2011 Alexis

Sort of SAM file by coordinate Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Creation of Workflow for automated analysis Alexis Dereeper CIBA courses – Brasil 2011 Alexis

Creation of Workflow for automated analysis Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Workflow: how to avoid to run all the process by hand Alexis Dereeper CIBA

Workflow: how to avoid to run all the process by hand Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot

Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot