Re Sequencing Video 1 Dowell Short Read Class

  • Slides: 12
Download presentation
Re. Sequencing Video 1 Dowell Short Read Class Phillip Richmond

Re. Sequencing Video 1 Dowell Short Read Class Phillip Richmond

Outline • The Plan for video 1 • Organize and copy data to your

Outline • The Plan for video 1 • Organize and copy data to your own working directory • Map reads back to a reference genome • Run a variant caller • Visualize variants

Plan • The first round of variant calling we’re going to do will involve

Plan • The first round of variant calling we’re going to do will involve cutting the yeast genome Sigma 1278 b into reads, mapping them back to the S 288 c reference genome, and then finding all SNP differences between the two genomes • The data for video 1 will be synthetic • The reads will already be produced for you in fastq format, 1 x 50 bp reads

Getting started • Organization is KEY!! • For the resequencing tutorial this is the

Getting started • Organization is KEY!! • For the resequencing tutorial this is the organization that will be necessary: • Make a new directory in your home directory called: Re. Sequencing • Inside of Re. Sequencing make subdirectories: – GENOME – FASTQ – SAM – VCF

Copying the data • Now we want to copy the data from: – /data/Dowell.

Copying the data • Now we want to copy the data from: – /data/Dowell. Short. Read/Re. Sequencing/ • Copy the Fastq file from the FASTQ directory (Sigmav 7_50 mers. fastq) to your own FASTQ directory • Copy SGDv 4. fasta from GENOME/ to your own directory GENOME/

Index the genome • Command: /opt/bowtie 2 -2. 0. 2/bowtie 2 -build <in. fasta>

Index the genome • Command: /opt/bowtie 2 -2. 0. 2/bowtie 2 -build <in. fasta> <out_index> • My Command: /opt/bowtie 2 -2. 0. 2/bowtie 2 -build /data/Dowell. Short. Read/Phil/GENOME/SGDv 4. fa sta /data/Dowell. Short. Read/Phil/GENOME/SGDv 4_b owtie 2_Index

Map the reads back to the genome • These reads need to have “readgroups”

Map the reads back to the genome • These reads need to have “readgroups” in order to work. • It’s best to add these when we map using the bowtie 2 options --rg and --rg-id: • Example: --rg-id Sigmav 7 vs. S 288 c_bowtie 2 –rg SM: Sigmav 7 vs. S 288 c_bowtie 2 • Full Command: /opt/bowtie 2 -2. 0. 2/bowtie 2 --rg-id Sigmav 7 vs. S 288 c_bowtie 2 -rg SM: Sigmav 7 vs. S 288 c_bowtie 2 /data/Dowell. Short. Read/Phil/GENOME/SGDv 4_bowtie 2_Index /data/Dowell. Short. Read/Phil/FASTQ/Sigmav 7_50 mers. fastq -S /data/Dowell. Short. Read/Phil/SAM/Sigmav 7_vs_S 288 c_bowtie 2. sam 2> /data/Dowell. Short. Read/Phil/SAM/Sigmav 7_vs_S 288 c_bowtie 2. stderr

Convert your file format using Samtools • samtools view –b. S <in. sam> -o

Convert your file format using Samtools • samtools view –b. S <in. sam> -o <out. bam> • samtools sort <in. bam> <out. sorted> • samtools index <in. sorted. bam> samtools view –b. S /data/Dowell. Short. Read/Phil/SAM/Sigmav 7_vs_S 288 c_bowtie 2. sam –o /data/Dowell. Short. Read/Phil/SAM/Sigmav 7_vs_S 288 c_bowtie 2. bam samtools sort /data/Dowell. Short. Read/Phil/SAM/Sigmav 7_vs_S 288 c_bowtie 2. bam /data/Dowell. Short. Read/Phil/SAM/Sigmav 7_vs_S 288 c_bowtie 2. sorted samtools index /data/Dowell. Short. Read/Phil/SAM/Sigmav 7_vs_S 288 c_bowtie 2. sorted. bam

Call variants using GATK Unified. Genotyper • The GATK package is a java executable,

Call variants using GATK Unified. Genotyper • The GATK package is a java executable, or a. jar file. • To run the package you type: java –jar /opt/gatk/2. 4 -9/Genome. Analysis. TK. jar • Then you must select a –T, or a program within the package to run, which in our case is Unified. Genotyper java –jar /opt/gatk/2. 4 -9/Genome. Analysis. TK. jar –T Unified. Genotyper

Call variants using GATK Unified. Genotyper • java –jar /opt/gatk/2. 49/Genome. Analysis. TK. jar

Call variants using GATK Unified. Genotyper • java –jar /opt/gatk/2. 49/Genome. Analysis. TK. jar –T Unified. Genotyper -glm BOTH -I <in. sorted. bam> -R <in. fasta> -o <out. vcf> java -jar /opt/gatk/2. 4 -9/Genome. Analysis. TK. jar -T Unified. Genotyper glm BOTH -R /data/Dowell. Short. Read/Phil/GENOME/SGDv 4. fasta -I /data/Dowell. Short. Read/Phil/SAM/Sigmav 7_vs_S 288 c_bowtie 2. sorted. bam -o /data/Dowell. Short. Read/Phil/VCF/Sigmav 7_vs_S 288 c_bowtie 2_gatk. vc f

View your VCF in IGV • GATK automatically indexes your VCF files, so now

View your VCF in IGV • GATK automatically indexes your VCF files, so now we can visualize both the reads and SNPs in IGV

Problem Set 1 • Repeat the following with two of the other strains available

Problem Set 1 • Repeat the following with two of the other strains available in /data/Dowell. Short. Read/Re. Sequencing/FASTQ / • Merge them using GATK’s Combine. Variants – If you get stuck here come find me for help, or go to http: //www. broadinstitute. org/gatkdocs/org _broadinstitute_sting_gatk_walkers_variantutils_ Combine. Variants. html • View the merged set in IGV