SOAP 3 dp Workflow SOAP 3 dp workflow

  • Slides: 15
Download presentation
SOAP 3 -dp Workflow

SOAP 3 -dp Workflow

SOAP 3 -dp workflow for paired-end alignment Step 1: Use SOAP 3 to align

SOAP 3 -dp workflow for paired-end alignment Step 1: Use SOAP 3 to align pair-ended reads paired alignments Paired-end reads chr 6, +4, 059, -4, 369; . . . …………. SOAP 3 (2 -mismatch) …………………. . Step 2: For reads with one end mapped but another not, use Default-DP to align the unmapped ends One ends’ alignments Default-DP chr 9, +49, 538; ………. …. paired alignments mapped region candidate region for the unmapped end of one end + The unmapped ends chr 9 49, 538 ………. chr 9, +49, 538, -49, 829; . . . …………. + use DP to align Step 3: For reads with both ends unaligned, use SOAP 3 to align the seeds and then use Deep-DP to align both ends seed alignments of second end seed alignments of first end SOAP 3 (1 -mismatch) seeds chr 18, +349, 683; . . . …………. + chr 18, -349, 998; . . . …………. Pair up the seed alignments Deep-DP paired alignments chr 18, +349, 664, -349, 923; . . . …………. candidate region + - 349, 998 349, 683 use DP to align chr 18 paired seed alignments chr 18, +349, 683, -349, 998; . . . ………….

Step 1: SOAP 3 Both ends can be mapped and paired properly Report the

Step 1: SOAP 3 Both ends can be mapped and paired properly Report the alignments SOAP 3 (2 -mismatch) A read pair is paired properly if: 1. Both ends are mapped within the insert size (i. e. a range of distance between two ends inputted by the user). 2. In proper orientation (for illumina reads, the end aligned to left side is in forward strand, while another aligned to right in reverse strand. ) Only one end can be mapped with not too many hits (i. e. <= 30) Store the read. ID (of aligned end) and hits to ARRAY A Only one end can be mapped with too many hits (i. e. > 30) Store the read. ID ( of aligned end) and hits to ARRAY B both ends cannot be mapped Store the read. ID (of the first read of the pairs) and hits to ARRAY C Both ends can be mapped but not paired properly Store the read. ID and hits to ARRAY A or B (describe more in next slides)

Step 1: SOAP 3 -- Both ends can be mapped but not paired properly

Step 1: SOAP 3 -- Both ends can be mapped but not paired properly read 1 YES Not paired properly Let x = # of all valid hits of read 1 read 2 YES Let y = # of all valid hits of read 2 If x > 30, only retains the best hits of read 1 and reset x = # of best hits of read 1. If y > 30, only retains the best hits of read 2 and reset y = # of best hits of read 2. a) x, y <= 30 YES NO NO YES ARRAY A b) x <= 30 < y YES NO ARRAY A c) y <= 30 < x NO YES ARRAY A d) 30 < x < y YES NO ARRAY B e) 30 < y <= x NO YES ARRAY B Store the read ID and hits of YES to ARRAY A or B

Step 2 and step 3: default DP and new default DP Both ends can

Step 2 and step 3: default DP and new default DP Both ends can be mapped and paired properly Report the alignments Array A Otherwise Store the read. ID of the first read of the pairs to ARRAY C Default DP Both ends can be mapped and paired properly Report the alignments Array B Otherwise New default DP Store the read. ID of the first read of the pairs to ARRAY C

Detailed picture of Default DP and New Default DP For reads with one end

Detailed picture of Default DP and New Default DP For reads with one end mapped but another not, AND the number of hits is not too many, use Default-DP to align the unmapped ends One ends’ alignments Default-DP chr 9, +49538; ………. …. paired alignments mapped region candidate region for the unmapped end of one end + The unmapped ends chr 9 49538 ………. chr 9, +49538, -49829; . . . …………. + use DP to align For reads with one end mapped but another not, AND the number of hits is too many, use New-Default-DP to align the unmapped ends One ends’ alignments chr 18, +349683; ………. …. seed alignments of unmapped end The unmapped ends + paired alignments chr 18, +349683, -349923; . . . …………. SOAP 3 (1 -mismatch) seeds chr 18, -349998; . . . …………. Pair up the seed alignments with the alignments of another end New-Default-DP candidate region + 349683 mapped region of one end 349998 - use DP to align chr 18, +349683, -349998; . . . ………….

Step 4: 2 -level Deep DP ARRAY C ROUND 1 SEEDING for both ends

Step 4: 2 -level Deep DP ARRAY C ROUND 1 SEEDING for both ends Seed length: 26 Sample rate: 1/13 Max # of hits allowed: 100 If (1) there exists a seed with too many hits; AND (2) no pairs of hits within insert size. If there exists pairs of hits within insert size. Perform DP for those pairs of hits within insert size. If there exists pairs of hits within insert size. Case 1: Valid paired alignments found Case 2: No valid paired alignment found ROUND 2 SEEDING for both ends Seed length: 30 Sample rate: 1/15 Max # of hits allowed: 1000 Report the alignments Store the read. ID of both ends to ARRAY D

Step 5: Single DP The end can be mapped Report the alignments Array D

Step 5: Single DP The end can be mapped Report the alignments Array D Otherwise Single DP Report the ends cannot be aligned

Detailed picture of Single DP seed alignments seeds Report the alignments chr 18, +349,

Detailed picture of Single DP seed alignments seeds Report the alignments chr 18, +349, 664; . . . …………. SOAP 3 (1 -mismatch) chr 18, +349, 683; . . . …………. Single-DP Candidate region + 349, 683 use DP to align Chr 18

Paired-end alignment (overall workflow) Load 6 M reads (3 M pairs) SOAP 3 (2

Paired-end alignment (overall workflow) Load 6 M reads (3 M pairs) SOAP 3 (2 -mismatch) Note: New-default DP needs 2 BWT in GPU, while default DP does not. Thus we run newdefault DP before default DP, because after SOAP 3, 2 BWT index is already inside GPU. New default DP Default DP 2 -level deep DP single DP Yes More reads to process? No END Create a new CPU thread to load next 6 M reads

SOAP 3 Architecture Device (GPU) Host (CPU) Memory-resident data structures 2 BWT + SA

SOAP 3 Architecture Device (GPU) Host (CPU) Memory-resident data structures 2 BWT + SA Execution Process 1 M reads for round 1 and round 2 alignments Execution Process round 3 alignment & Report results Process 1 M reads for round 1 and round 2 alignments Process round 3 alignment & report results Process 1 M reads for round 1 and round 2 alignments ……. .

DP with seeding Device (GPU) Host (CPU) Memory-resident data structures 2 BWT / DP

DP with seeding Device (GPU) Host (CPU) Memory-resident data structures 2 BWT / DP tables 2 BWT + SA Execution Copy 2 BWT index to GPU & Extract seeds of reads in Array C SOAP 3 (1 -mismatch) Process 1 M seeds for round 1 and round 2 alignments Process round 3 alignment ……. . Pair-up the seed alignments, Clear 2 BWT index in GPU & Create DP tables in GPU Process 1 M seeds for round 1 and round 2 alignments ……. . Perform DP between the reads and the candidate regions

Default DP Host (CPU) Memory-resident data structures Device (GPU) Memory-resident data structures DP tables

Default DP Host (CPU) Memory-resident data structures Device (GPU) Memory-resident data structures DP tables 2 BWT + SA Execution Create DP tables in GPU Execution Perform DP between the reads and the candidate regions

Single-end alignment (overall workflow) Load 6 M single-end reads SOAP 3 (2 -mismatch) single

Single-end alignment (overall workflow) Load 6 M single-end reads SOAP 3 (2 -mismatch) single DP Yes More reads to process? No END Create a new CPU thread to load next 6 M reads

Paired-end alignment (For read length > 150) Load 6 M reads (3 M pairs)

Paired-end alignment (For read length > 150) Load 6 M reads (3 M pairs) 2 -level deep DP single DP Yes More reads to process? No END Create a new CPU thread to load next 6 M reads