Rearrangements and Duplications in Tumor Genomes Tumor Genomes

  • Slides: 62
Download presentation
Rearrangements and Duplications in Tumor Genomes

Rearrangements and Duplications in Tumor Genomes

Tumor Genomes Mutation and selection Compromised genome stability • Chromosomal aberrations – Structural: translocations,

Tumor Genomes Mutation and selection Compromised genome stability • Chromosomal aberrations – Structural: translocations, inversions, fissions, fusions. – Copy number changes: gain and loss of chromosome arms, segmental duplications/deletions.

Rearrangements in Tumors Change gene structure, create novel fusion genes • Gleevec (Novartis 2001)

Rearrangements in Tumors Change gene structure, create novel fusion genes • Gleevec (Novartis 2001) targets ABL-BCR fusion

Rearrangements in Tumors Alter gene regulation Burkitt lymphoma translocation IMAGE CREDIT: Gregory Schuler, NCBI,

Rearrangements in Tumors Alter gene regulation Burkitt lymphoma translocation IMAGE CREDIT: Gregory Schuler, NCBI, NIH, Bethesda, MD, USA Regulatory fusion in prostate cancer (Tomlins et al. Science Oct. 2005)

Complex Tumor Genomes 1) 2) 3) 4) What are detailed architectures of tumor genomes?

Complex Tumor Genomes 1) 2) 3) 4) What are detailed architectures of tumor genomes? What genes affected? What processes produce these architectures? Can we create custom treatments for tumors based on mutational spectrum? (e. g. Gleevec)

Common Alterations across Tumors • Mutations activate/repress circuits. • Multiple points of attack. •

Common Alterations across Tumors • Mutations activate/repress circuits. • Multiple points of attack. • “Master genes”: e. g. p 53, Myc. • Others probably tissue/tumor specific. activation repression Duplicated genes Deleted genes

Human Cancer Genome Project etc. • • What tumors to sequence? What to sequence

Human Cancer Genome Project etc. • • What tumors to sequence? What to sequence from each tumor? 1. Whole genome: all alterations 2. Specific genes: point mutations 3. Hybrid approach: structural rearrangements

Human Cancer Genome Project etc. • • What tumors to sequence? What to sequence

Human Cancer Genome Project etc. • • What tumors to sequence? What to sequence from each tumor? 1. Whole genome: all alterations 2. Specific genes: point mutations 3. Hybrid approach: structural rearrangements

End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1) Pieces

End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1) Pieces of tumor genome: clones (100250 kb). Tumor DNA 2) Sequence ends of clones (500 bp). Human DNA x y 3) Map end sequences to human genome. Each clone corresponds to pair of end sequences (ES pair) (x, y). Retain clones that correspond to a unique ES pair.

End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1) Pieces

End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1) Pieces of tumor genome: clones (100250 kb). Tumor DNA 2) Sequence ends of clones (500 bp). L Human DNA x y 3) Map end sequences to human genome. Valid ES pairs • l ≤ y – x ≤ L, min (max) size of clone. • Convergent orientation.

End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1) Pieces

End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1) Pieces of tumor genome: clones (100250 kb). Tumor DNA 2) Sequence ends of clones (500 bp). L Human DNA x y 3) Map end sequences to human genome. Invalid ES pairs • Putative rearrangement in tumor • ES directions toward breakpoints

Outline What does ESP reveal about tumor genomes? 1. Identify locations of rearrangements. 2.

Outline What does ESP reveal about tumor genomes? 1. Identify locations of rearrangements. 2. Reconstruct genome architecture, sequence of rearrangements. 1. 3. In combination with other genome data (CGH).

ESP Data (Jan. 2006) Breast Cancer Cell Lines Tumors BT 474 MCF 7 SKBR

ESP Data (Jan. 2006) Breast Cancer Cell Lines Tumors BT 474 MCF 7 SKBR 3 Brain Breast 1 Breast 2 Ovary Prostate Normal • Coverage of human genome: ≈ 0. 34 for MCF 7, BT 474 Clones 5267 5031 9580 7623 19831 9612 4246 1756 9267 ES pairs 3923 3448 7994 5588 6785 3222 12073 1300 7300

1. Rearrangement breakpoints MCF 7 breast cancer • Known cancer genes (e. g. ZNF

1. Rearrangement breakpoints MCF 7 breast cancer • Known cancer genes (e. g. ZNF 217, BCAS 3/4, STAT 3) • Novel candidates near breakpoints. • Small-scale scrambling of genome more extensive than expected.

Structural Polymorphisms • Human genetic variation more than nucleotide substitutions • Short indels/inversions present

Structural Polymorphisms • Human genetic variation more than nucleotide substitutions • Short indels/inversions present • (Iafrate et al. 2004, Sebat et al. 2004, Tuzun et al. 2005, Mc. Carroll et al. 2006, Conrad et al. 2006 etc. ) • ≈ 3% (53/1570) invalid ES pairs explained by known structural variants. Reference Human A B s 1. 6 Mb inversion Human Variant C t inversion A -B s C t

2. Tumor Genome Architecture 1) What are detailed architectures of tumor genomes? 2) What

2. Tumor Genome Architecture 1) What are detailed architectures of tumor genomes? 2) What sequence of rearrangements produce these architectures?

ESP Genome Reconstruction Problem A C B E D Unknown sequence of rearrangements Human

ESP Genome Reconstruction Problem A C B E D Unknown sequence of rearrangements Human genome (known) Tumor genome (unknown) Map ES pairs to human genome. Reconstruct tumor genome x 1 x 2 x 3 x 4 y 1 y 2 x 5 y 4 y 3 Location of ES pairs in human genome. (known)

ESP Genome Reconstruction Problem A C B E D Unknown sequence of rearrangements Human

ESP Genome Reconstruction Problem A C B E D Unknown sequence of rearrangements Human genome (known) Tumor genome (unknown) A -C -D Map ES pairs to human genome. Reconstruct tumor genome x 1 x 2 E B x 3 x 4 y 1 y 2 x 5 y 4 y 3 Location of ES pairs in human genome. (known)

ESP Genome Reconstruction: Comparative Genomics E B Tumor -D A -C -D B E

ESP Genome Reconstruction: Comparative Genomics E B Tumor -D A -C -D B E -C A A B C Human D E

ESP Genome Reconstruction: Comparative Genomics E B Tumor -D -C A A B C

ESP Genome Reconstruction: Comparative Genomics E B Tumor -D -C A A B C Human D E

ESP Genome Reconstruction: Comparative Genomics E B Tumor -D -C A A B C

ESP Genome Reconstruction: Comparative Genomics E B Tumor -D -C A A B C Human D E

ESP Genome Reconstruction: Comparative Genomics E (x 3, y 3) B (x 2, y

ESP Genome Reconstruction: Comparative Genomics E (x 3, y 3) B (x 2, y 2) Tumor -D (x 4, y 4) -C (x 1, y 1) A A B x 1 x 2 C x 3 x 4 D y 1 y 2 E y 4 y 3

ESP Plot E (x 3, y 3) (x 4, y 4) D (x 2,

ESP Plot E (x 3, y 3) (x 4, y 4) D (x 2, y 2) 2 D Representation of ESP Data Human (x 1, y 1) C • Each point is ES pair. • Can we reconstruct the tumor genome from the positions of the ES pairs? B A A B C Human D E

ESP Plot E D 2 D Representation of ESP Data Human C • Each

ESP Plot E D 2 D Representation of ESP Data Human C • Each point is ES pair. • Can we reconstruct the tumor genome from the positions of the ES pairs? B A A B C Human D E

ESP Plot → Tumor Genome E E D -D Human C -C B A

ESP Plot → Tumor Genome E E D -D Human C -C B A A Reconstructed Tumor Genome A B -C C Human -D D B E E

E D 2 D Representation of ESP Data Human C • Each point is

E D 2 D Representation of ESP Data Human C • Each point is ES pair. • Can we reconstruct the tumor genome from the positions of the ES pairs? B A A B C Human D E

2 D Representation of ESP Data Human • Each point is ES pair. •

2 D Representation of ESP Data Human • Each point is ES pair. • Can we reconstruct the tumor genome from the positions of the ES pairs? Human

Real data noisy and incomplete! Valid ES pairs • satisfy length/direction constraints l≤y–x≤L Invalid

Real data noisy and incomplete! Valid ES pairs • satisfy length/direction constraints l≤y–x≤L Invalid ES pairs • indicate rearrangements • experimental errors

Computational Approach 1. Use known genome rearrangement mechanisms Human A Tumor B C B

Computational Approach 1. Use known genome rearrangement mechanisms Human A Tumor B C B s A t s A inversion t C t s D translocation C -B -C A s -B D t 2. Find simplest explanation for ESP data, given these mechanisms. 3. Motivation: Genome rearrangements studies in phylogeny.

ESP Sorting Problem • G = [0, M], unichromosomal genome. • Reversal s, t(x)=

ESP Sorting Problem • G = [0, M], unichromosomal genome. • Reversal s, t(x)= x, if x < s or x > t, t – (x – s), otherwise. A B C y 1 t x 1 s A x 2 y 2 -B x 1 y 1 x 2 y 2 G G’ = G Given: ES pairs (x 1, y 1), …, (xn, yn) Find: Minimum number of reversals s 1, t 1, …, sn, tn such that if = s 1, t 1… sn, tn then ( x 1, y 1 ), …, ( xn, yn) are valid ES pairs.

A B x 1 s C y 1 t y 2 x 3 y

A B x 1 s C y 1 t y 2 x 3 y 3 A -B -C x 1 y 3 x 2 y 2 t Sequence of reversals. s s t All ES pairs valid.

Filtering Experimental Noise 1) Pieces of tumor genome: clones (100 -250 kb). Tumor DNA

Filtering Experimental Noise 1) Pieces of tumor genome: clones (100 -250 kb). Tumor DNA Rearrangement Cluster invalid pairs Human DNA 2) Sequence ends of clones (500 bp). Chimeric clone Isolated invalid pair x y 3) Map end sequences to human genome.

Sparse Data Assumptions 1. Each cluster results from single inversion. human x 1 x

Sparse Data Assumptions 1. Each cluster results from single inversion. human x 1 x 2 x 3 tumor y 2 y 1 y 3 x 1 x 2 y 1 y 2 x 3 2. Each clone contains at most one breakpoint. tumor y 3

ESP Genome Reconstruction: Discrete Approximation Human 1) Remove isolated invalid pairs (x, y) Human

ESP Genome Reconstruction: Discrete Approximation Human 1) Remove isolated invalid pairs (x, y) Human

ESP Genome Reconstruction: Discrete Approximation Human 1) Remove isolated invalid pairs (x, y) 2)

ESP Genome Reconstruction: Discrete Approximation Human 1) Remove isolated invalid pairs (x, y) 2) Define segments from clusters Human

ESP Genome Reconstruction: Discrete Approximation Human 1) Remove isolated invalid pairs (x, y) 2)

ESP Genome Reconstruction: Discrete Approximation Human 1) Remove isolated invalid pairs (x, y) 2) Define segments from clusters 3) ES Orientations define links between segment ends Human

ESP Genome Reconstruction: Discrete Approximation (x 2, y 2) (x 3, y 3) (x

ESP Genome Reconstruction: Discrete Approximation (x 2, y 2) (x 3, y 3) (x 1, y 1) s Human 1) Remove isolated invalid pairs (x, y) 2) Define segments from clusters 3) ES Orientations define links between segment ends Human t

ESP Graph 5 5 4 4 3 3 Edges: 1. Human genome segments 2.

ESP Graph 5 5 4 4 3 3 Edges: 1. Human genome segments 2. ES pairs Paths in graph are tumor genome architectures. 2 2 1 Tumor genome (1 -3 -4 2 5) = signed permutation of (1 2 3 4 5) 1 1 2 3 4 5

Sorting permutations by reversals (Sankoff et al. 1990) = 1 2… n signed permutation

Sorting permutations by reversals (Sankoff et al. 1990) = 1 2… n signed permutation Reversal (i, j) [inversion] 1… i-1 - j. . . - i j+1… n Problem: Given , find a sequence of reversals 1, …, t with such that: ¢ 1 ¢ 2 ¢ ¢ ¢ t = (1, 2, …, n) and t is minimal. Solution: Analysis of breakpoint graph ← ESP graph Polynomial time algorithms O(n 4) : Hannenhalli and Pevzner, 1995. O(n 2) : Kaplan, Shamir, Tarjan, 1997. O(n) [distance t] : Bader, Moret, and Yan, 2001. O(n 3) : Bergeron, 2001.

Sorting Permutations 1 -3 -4 2 5 1 -3 -2 4 5 1 2

Sorting Permutations 1 -3 -4 2 5 1 -3 -2 4 5 1 2 3 4 5

Breakpoint Graph start 1 -3 -4 2 5 end Black edges: adjacent elements of

Breakpoint Graph start 1 -3 -4 2 5 end Black edges: adjacent elements of Gray edges: adjacent elements of i =12345 start 1 2 3 4 5 end Key parameter: Black-gray cycles

Breakpoint Graph start 1 -3 -4 2 5 end start 1 -3 -2 4

Breakpoint Graph start 1 -3 -4 2 5 end start 1 -3 -2 4 5 end start 1 2 3 4 5 end Black edges: adjacent elements of Gray edges: adjacent elements of i =12345 Key parameter: Black-gray cycles Theorem: Minimum number of reversals to transform to identity permutation i is: d( ) ≥ n+1 - c( ) where c( ) = number of gray-black cycles. ESP Graph → Tumor Permutation and Breakpoint Graph

MCF 7 Breast Cancer Cell Line • Low-resolution chromosome painting suggests complex architecture. •

MCF 7 Breast Cancer Cell Line • Low-resolution chromosome painting suggests complex architecture. • Many translocations, inversions.

ESP Data from MCF 7 tumor genome Each point (x, y) is ES pair.

ESP Data from MCF 7 tumor genome Each point (x, y) is ES pair. • 6239 ES pairs (June 2003) • 5856 valid (black) • 383 invalid • 256 isolated (red) • 127 form 30 clusters (blue) Coordinate in human genome

MCF 7 Genome Human chromosomes Sequence of 5 inversions 15 translocations MCF 7 chromosomes

MCF 7 Genome Human chromosomes Sequence of 5 inversions 15 translocations MCF 7 chromosomes Raphael, Volik, Collins, Pevzner. Bioinformatics 2003.

3. Combining ESP with other genome data Array Comparative Genomic Hybridization (a. CGH)

3. Combining ESP with other genome data Array Comparative Genomic Hybridization (a. CGH)

CGH Analysis • Divide genome into segments of equal copy number Copy number profile

CGH Analysis • Divide genome into segments of equal copy number Copy number profile Genome coordinate

CGH Analysis • Divide genome into segments of equal copy number Copy number profile

CGH Analysis • Divide genome into segments of equal copy number Copy number profile Genome coordinate Numerous methods (e. g. clustering, Hidden Markov Model, Bayesian, etc. ) Segmentation No information about: • Structural rearrangements (inversions, translocations) • Locations of duplicated material in tumor genome.

CGH Segmentation Copy number 5 3 2 Genome Coordinate How are the copies of

CGH Segmentation Copy number 5 3 2 Genome Coordinate How are the copies of segments linked? ? ? ES pairs links segments Tumor genome

ESP + CGH Copy number 5 3 2 Genome Coordinate CGH breakpoint ESP breakpoint

ESP + CGH Copy number 5 3 2 Genome Coordinate CGH breakpoint ESP breakpoint ES near segment boundaries

ESP and CGH Breakpoints ESP breakpoints MCF 7 CGH breakpoints 730 256 39 (P

ESP and CGH Breakpoints ESP breakpoints MCF 7 CGH breakpoints 730 256 39 (P = 1. 2 x 10 -4) ESP breakpoints BT 474 12/39 clusters CGH breakpoints 426 244 33 (P = 5. 4 x 10 -7) 8/33 clusters

Copy number Microdeletion in BT 474 ES pair 3 2 0 ≈ 600 kb

Copy number Microdeletion in BT 474 ES pair 3 2 0 ≈ 600 kb “interesting” genes in this region Valid ES pair < 250 kb

Combining ESP and CGH Copy number 5 3 2 Genome Coordinate ES pairs links

Combining ESP and CGH Copy number 5 3 2 Genome Coordinate ES pairs links segments. Copy number balance at each segment boundary: 5 = 2 + 3.

Combining ESP and CGH Copy number 5 3 ≤ f(e) ≤ 5 1 ≤

Combining ESP and CGH Copy number 5 3 ≤ f(e) ≤ 5 1 ≤ f(e) ≤ 4 3 2 1 ≤ f(e) ≤ 3 Genome Coordinate • CGH copy number not exact. • What genome architecture “most consistent” with ESP and CGH data?

Combining ESP and CGH Copy number 5 3 2 Genome Coordinate 3 ≤ f(e)

Combining ESP and CGH Copy number 5 3 2 Genome Coordinate 3 ≤ f(e) ≤ 5 1 ≤ f(e) ≤ 3 1 ≤ f(e) ≤ 4 Build graph 1. 2. 3. Edge for each CGH segment. Edge for each ES pair consistent with segments. Range of copy number values for each CGH edge.

Network Flow Problem f(e) Flow constraints: l(e) ≤ f(e) ≤ u(e) Flow constraint on

Network Flow Problem f(e) Flow constraints: l(e) ≤ f(e) ≤ u(e) Flow constraint on each CGH edge l(e) ≤ f(e) ≤ u(e) 8 e CGH edge: l(e) and u(e) from CGH ESP edge: l(e) = 1, u(e) = 1

Network Flow Problem f(e) Flow constraints: l(e) ≤ f(e) ≤ u(e) Flow in =

Network Flow Problem f(e) Flow constraints: l(e) ≤ f(e) ≤ u(e) Flow in = flow out at each vertex l(e) ≤ f(e) ≤ u(e) 8 e (u, v) f( (u, v) ) = (v, w) f( v, w) ) 8 v CGH edge: l(e) and u(e) from CGH ESP edge: l(e) = 1, u(e) = 1

Network Flow Problem • Minimum Cost Circulation with Capacity Constraints (Sequencing by Hybridization, Sequence

Network Flow Problem • Minimum Cost Circulation with Capacity Constraints (Sequencing by Hybridization, Sequence Assembly) Flow constraints: l(e) ≤ f(e) ≤ u(e) f(e) CGH edge: l(e) and u(e) from CGH ESP edge: l(e) = 1, u(e) = 1 Source/sink min e (e) Subject to: l(e) ≤ f(e) ≤ u(e) 8 e (u, v) f( (u, v) ) = (v, w) f( v, w) ) 8 v Costs: (e) = 0, e ESP or CGH edge 1, e incident to source/sink

Network Flow Results f(e) Source/sink • Unsatisfied flow are putative locations of missing ESP

Network Flow Results f(e) Source/sink • Unsatisfied flow are putative locations of missing ESP data. • Prioritize further sequencing. • Targeted ESP by screening library with CGH probes.

Network Flow Results • Identify amplified translocations – 14 in MCF 7 – 5

Network Flow Results • Identify amplified translocations – 14 in MCF 7 – 5 in BT 474 • Eulerian cycle in combined graph gives tumor genome architecture. Flow values → Edge multiplicities

Human Cancer Genome Project etc. • • What tumors to sequence? What to sequence

Human Cancer Genome Project etc. • • What tumors to sequence? What to sequence from each tumor? 1. Whole genome: all alterations 2. Specific genes: point mutations 3. Hybrid approach: structural rearrangements

Human Cancer Genome Project

Human Cancer Genome Project