Analysis of NGS raw data with Galaxy Cleaning
- Slides: 66
Analysis of NGS raw data with Galaxy Cleaning, data control, alignment, polymorphism Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Aim of the Tutorial classes: 1 - Galaxy vs Command line 2 - Understand FASTQ files 3 - Cleaning of Illumina data (FASTQ) 4 - Perform an assembly 5 - Perform a mapping of Illumina reads on a reference sequence 6 - Cleaning of a multiple SAM file Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
1 - Galaxy CIRAD Server : http: //gohelle. cirad. fr/galaxy/ CIBA courses – Brasil 2011 Serveur principal: Alexis Dereeper http: //main. g 2. bx. psu. edu/ Alexis Dereeper, François Sabot
TOOLS DATA Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
WEB APPLICATION - “Click'n'Play” system - transparent for user Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks (already integrated) - Adding of customizable bricks Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks (already integrated) - Adding of customizable bricks MULTIPLE - Based on a web server (Apache. . . ) - On a single machine, or a cluster. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks (already integrated) - Adding of customizable bricks MULTIPLE - Based on a web server (Apache. . . ) - On a single machine, or a cluster. . . BUT - Simple support - Much less powerful than terminal - Only for routine analysis - Only for limited data Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
CONNECTION FOR THE TUTORIAL CLASSES: http: //gohelle. cirad. fr/galaxy/ Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Connecting. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Add data. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Import data from Galaxy libraries Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Import data from Galaxy libraries Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
FASTQ file → TEXT file STRUCTURE: @HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb @HWUSI-EAS 454_0006: 1: 37: 16314: 3410#CTTGTA AGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGTGGTGGCCG + `b. Tbbccccceeeeeccc. Yeedded`ceec]dddde^a`deeeec`dddcbaadad. Yd`]]Jc_^bc^^ Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
SEQUENCE NAME @HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
IUPAC SEQUENCE @HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Quality in ASCII Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
@HWUSI-EAS 454_0006: 1: 112: 14105: 5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffcffeffffdffffafcfffffdfefeddf^eececfffdfcbffb f → Quality = 38 (102 – 64) Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
WHAT IS QUALITY ? Quality value Q is an integer mapping of p (i. e. , the probability that the corresponding base call is incorrect). Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
FASTQC: quality control http: //www. bioinformatics. bbsrc. ac. uk/projects/download. html#fastqc Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Why do we need to clean ? To remove remaining adapters/primers and low quality sequences → Cut. Adapt Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
20 70 7 Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Your data are now ready to be analyzed. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Concatenate files Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Untested Tools → NGS → Assembly → Assemble with MIRA Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
BLAST of putative contigs against reference Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
BLAST of putative contigs against reference Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Separate sequences by original individuals RC 1, RC 2. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Separate sequences by original individuals RC 1, RC 2. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Separate sequences by original individuals RC 1, RC 2. . . Use of regular expression via Galaxy: → RC[13456789] & remove reads => keep RC 2 → RC[123456789]_ & remove reads => keep RC 10 Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Separate sequences by original individuals RC 1, RC 2. . . Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference 1 - Compute positions for each read Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference 1 - Compute positions for each read 2 - Associate positions of each member of the pair Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference 1 - Compute positions for each read 2 - Associate positions of each member of the pair 3 - Selection of the more probable position respecting the conditions Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference 1 - Compute positions for each read 2 - Associate positions of each member of the pair 3 - Select of the more probable position respecting the conditions 4 - Edit a SAM output file Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Reference From History: Shared data/Formation/Pre. Process/reference. fasta Library: Paired-end FASTQ files: From your history BWA setting to use: Commonly Used Unselect “Suppress the header in the output SAM file” Click Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
SAM output file (Sequence Alignment/Map) Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Sort of SAM file by coordinate Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Creation of Workflow for automated analysis Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Workflow: how to avoid to run all the process by hand Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
Alexis Dereeper CIBA courses – Brasil 2011 Alexis Dereeper, François Sabot
- Spiral galaxy elliptical galaxy irregular galaxy
- Ngs data analysis using r
- Ngs sequencing data analysis
- Sweating like molten iron
- Data cleaning problems and current approaches
- Data quality and data cleaning an overview
- Data quality and data cleaning an overview
- National geodetic survey data explorer
- Hospital cleaning cost analysis
- Pengolahan data editing, coding processing cleaning
- Potter's wheel data cleaning tool
- Stata cleaning data
- Disambiguation data cleaning
- Etl in data cleaning and preprocessing stands for
- Data cleaning in spss
- Data cleaning lesson
- Transfer learning ai
- Raw data of experience
- Raw data pis pk
- Data are raw facts
- "defense advanced gps receiver"
- Contoh raw data
- Data hazards raw war waw
- Opus ngs
- Ngs acting
- Ngs geodetic toolkit
- Ngs conversion tool
- Basil khuder
- Northeastern unified program integrity contractor
- Ngs antenna calibration
- Ngs file formats
- Ngs opus
- Novitas ivr
- Fae.ngs
- Ngs
- Ngs mrd
- Ngs portal
- Relative gravimeter
- Opus ngs
- Ngs
- Ngs roadmap
- Nadia pisanti
- Ngs
- Akira fujii
- Metaomic
- Galazy.einet
- Galaxy einet
- C5 galaxy cockpit
- Disc filters galaxy 4 spin klin series
- Galaxy definition
- Life cycle of galaxies
- Milky way galaxy
- Galaxy control systems
- Jupiler
- Galaxy classification
- Galaxy formation
- Virtual galaxy
- Canis major overdensity
- Steve's galaxy (legacy)
- Nouns with v
- Why do disk stars bob up and down as they orbit the galaxy?
- Article galaxy scholar
- John elert galaxy
- Radio galaxy
- Milky way galaxy sketch
- Galaxy loght
- Radio galaxy