Tumor Genomes Mutation and selection Compromised genome stability
Tumor Genomes Mutation and selection Compromised genome stability • Chromosomal aberrations – Structural: translocations, inversions, fissions, fusions. – Copy number changes: gain and loss of chromosome arms, segmental duplications/deletions.
Rearrangements in Tumors Change gene structure, create novel fusion genes • Gleevec (Novartis 2001) targets ABL-BCR fusion
End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1) Pieces of tumor genome: clones (100250 kb). Tumor DNA 2) Sequence ends of clones (500 bp). Human DNA x y 3) Map end sequences to human genome. Each clone corresponds to pair of end sequences (ES pair) (x, y). Retain clones that correspond to a unique ES pair.
End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1) Pieces of tumor genome: clones (100250 kb). Tumor DNA 2) Sequence ends of clones (500 bp). L Human DNA x y 3) Map end sequences to human genome. Valid ES pairs • l ≤ y – x ≤ L, min (max) size of clone. • Convergent orientation.
End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1) Pieces of tumor genome: clones (100250 kb). Tumor DNA 2) Sequence ends of clones (500 bp). L Human DNA a x y b 3) Map end sequences to human genome. Invalid ES pairs • Putative rearrangement in tumor • ES directions toward breakpoints (a, b): l ≤ |x-a| + |y-b| ≤ L
ESP Genome Reconstruction Problem A C B E D Unknown sequence of rearrangements Human genome (known) Tumor genome (unknown) Map ES pairs to human genome. Reconstruct tumor genome x 1 x 2 x 3 x 4 y 1 y 2 x 5 y 4 y 3 Location of ES pairs in human genome. (known)
ESP Genome Reconstruction Problem A C B E D Unknown sequence of rearrangements Human genome (known) Tumor genome (unknown) A -C -D Map ES pairs to human genome. Reconstruct tumor genome x 1 x 2 E B x 3 x 4 y 1 y 2 x 5 y 4 y 3 Location of ES pairs in human genome. (known)
ESP Genome Reconstruction: Comparative Genomics E B Tumor -D A -C -D B E -C A A B C Human D E
ESP Genome Reconstruction: Comparative Genomics E B Tumor -D -C A A B C Human D E
ESP Genome Reconstruction: Comparative Genomics E B Tumor -D -C A A B C Human D E
ESP Genome Reconstruction: Comparative Genomics E (x 3, y 3) B (x 2, y 2) Tumor -D (x 4, y 4) -C (x 1, y 1) A A B x 1 x 2 C x 3 x 4 D y 1 y 2 E y 4 y 3
ESP Plot E (x 3, y 3) (x 4, y 4) D (x 2, y 2) 2 D Representation of ESP Data Human (x 1, y 1) C • Each point is ES pair. • Can we reconstruct the tumor genome from the positions of the ES pairs? B A A B C Human D E
ESP Plot E D 2 D Representation of ESP Data Human C • Each point is ES pair. • Can we reconstruct the tumor genome from the positions of the ES pairs? B A A B C Human D E
ESP Plot → Tumor Genome E E D -D Human C -C B A A Reconstructed Tumor Genome A B -C C Human -D D B E E
E D 2 D Representation of ESP Data C • Each point is ES pair. • Can we reconstruct the tumor genome from the ES pairs? Human B A A B C Human D E
2 D Representation of ESP Data Human • Each point is ES pair. • Can we reconstruct the tumor genome from the ES pairs? Human
Real data noisy and incomplete! Valid ES pairs • satisfy length/direction constraints l≤y–x≤L Invalid ES pairs • indicate rearrangements • experimental errors
Computational Approach 1. Use known genome rearrangement mechanisms Human A Tumor B C B s A t s A inversion t C t s D translocation C -B -C A s -B D t 2. Find simplest explanation for ESP data, given these mechanisms. 3. Motivation: Genome rearrangements studies in evolution/phylogeny.
ESP Sorting Problem • G = [0, M], unichromosomal genome. • Inversion (Reversal) s, t A A x 1 B s s y 1 C t C -B t x 2 y 2 G x, if x < s or x > t, s, t(x) = t – (x – s), otherwise. G’ = G Given: ES pairs (x 1, y 1), …, (xn, yn) Find: Minimum number of reversals s 1, t 1, …, sn, tn such that if = s 1, t 1… sn, then ( x 1, y 1 ), …, ( xn, yn) are valid ES pairs.
Filtering Experimental Noise Tumor DNA 1) Pieces of tumor genome: clones (100 -250 kb). Rearrangement Cluster invalid pairs Human DNA Chimeric clone 2) Sequence ends of clones (500 bp). Isolated invalid pair x y 3) Map end sequences to human genome.
Sparse Data Assumptions 1. Each cluster results from single inversion. human x 1 x 2 x 3 tumor y 2 y 1 y 3 x 1 x 2 y 1 y 2 x 3 2. Each clone contains at most one breakpoint. tumor y 3
ESP Genome Reconstruction: Discrete Approximation Human 1) Remove isolated invalid pairs (x, y) Human
ESP Genome Reconstruction: Discrete Approximation Human 1) Remove isolated invalid pairs (x, y) 2) Define segments from clusters Human
ESP Genome Reconstruction: Discrete Approximation Human 1) Remove isolated invalid pairs (x, y) 2) Define segments from clusters 3) ES Orientations define links between segment ends Human
ESP Genome Reconstruction: Discrete Approximation (x 2, y 2) (x 3, y 3) (x 1, y 1) s Human 1) Remove isolated invalid pairs (x, y) 2) Define segments from clusters 3) ES Orientations define links between segment ends Human t
ESP Graph 5 5 4 3 3 2 2 Edges: 4 1. Human genome segments 2. ES pairs Paths in graph are tumor genome architectures. Human Genome Minimal sequence of translocations and inversions (1 2 3 4 5) 1 1 1 2 3 4 5 Tumor Genome (1 -3 -4 2 5 )
Breakpoint Graph start 1 -3 -4 2 5 end start 1 -3 -2 4 5 end start 1 2 3 4 5 end Black edges: adjacent elements of Gray edges: adjacent elements of i =12345 Key parameter: Black-gray cycles Theorem: Minimum number of reversals to transform to identity permutation i is: d( ) ≥ n+1 - c( ) where c( ) = number of gray-black cycles. ESP Graph → Tumor Permutation and Breakpoint Graph
MCF 7 Breast Cancer Cell Line • Low-resolution chromosome painting suggests complex architecture. • Many translocations, inversions.
MCF 7 Genome Human chromosomes Sequence 5 inversions 15 translocations MCF 7 chromosomes Raphael, et al. Bioinformatics 2003.
3. Rearrangement/duplication mechanisms • Does ESP suggest mechanisms that scramble tumor genomes?
Another look at MCF 7 • 11240 ES pairs • 10453 valid (black) • 737 invalid • 489 isolated (red) • 248 form 70 clusters (blue) 33/70 clusters Total length: 31 Mb
Structure of Duplications in Tumors? • Duplicated segments may co-localize (Guan et al. Nat. Gen. 1994) Human genome Tumor genome • Mechanisms not well understood.
Structure of Duplications in Tumors? • Duplicated segments may co-localize (Guan et al. Nat. Gen. 1994) Human genome Tumor genome • Mechanisms not well understood.
Analyzing Duplications Tumor D A B Human C D v u ? ? E duplication A w B D v u A C B u E w C D v w
Analyzing Duplications Tumor D A B Human C D E v u A C u ? ? D duplication A w B B D v u A C B u E w C D v w
Analyzing Duplications Tumor D A B Human C D E v u A C u D duplication A w B B D v u co-duplication A C B u Additional ES pair resolves duplication E w C D v w
Duples and Boundary Elements Tumor D A B Human C D E v u A C u D duplication A w B B D v u A C B u E w C D v w Call this configuration a duple with boundary elements v and w.
Duplications in ESP graph A B u C E duplication D v D A w w D v C B A u C D v u duple B E boundary elements v, w are vertices in ESP graph E w
Duplications in ESP graph A B u C E duplication D v D A w w D v C B A u C D v u duple B E E w boundary elements v, w are vertices in ESP graph Path between boundary elements resolves duple.
Duplication Complications A B u C ? ? E v w w v u These configurations frequent in MCF 7 data.
Resolving Duplication as Paths A B u C D v E A w B u w v u Path between boundary elements resolves duple.
Resolving Duplications as Paths A B u C E v A w B u w v u Multiple paths between duple boundary elements.
Many Paths in MCF 7!
Tumor Amplisomes (Maurer, et al. 1987; Wahl, 1989…) Other terms: • Episome • Amplicon • Double-minute
Duplication by Amplisome Gives single model for all duplications
Amplisome Reconstruction Problem • Assume 1. Tumor genome sequence is known. 2. Insertions are independent, – i. e. no insertions within insertions • Approach 1. Identify duplicated sequences A 1, …, Am 2. Amplisome is shortest common superstring of A 1, …, Am
Amplisome Reconstruction Problem • Assume 1. Tumor genome sequence is known. 2. Insertions are independent, – i. e. no insertions within insertions
ESP Amplisome Reconstruction Problem • Assume 1. Insertions are independent, – i. e. no insertions within insertions A B u • C E v w Approach 1. Identify duples with boundary elements (v 1, w 1), … (vm, wm) 2. Amplisome is shortest path in ESP graph containing subpaths v 1…w 1, v 2…w 2, …, vm…wm
Reconstructed MCF 7 amplisome Chromosomes 1 3 17 20 33 clusters Total length: 31 Mb Amplisome model explains 24/33 invalid clusters. Raphael and Pevzner. Bioinformatics 2004.
Duplicated Translocation x 1 x 2 a Clone size: (a – x 1) Resulting clone: xi y 2 y 1 b + (b – y 1) a b yi Breakpoint (a, b) in one clone suggests sizes (a-xi) + (b – yi) for other clones in cluster • Cluster of 20 ES pairs. • One clone sequenced. • Experimental sizes agreed with inferred sizes All clones share same breakpoint. Duplication of region occurs after translocation
Clone Sequencing Draft sequencing of 29 clones 50 rearrangement breakpoints Some clones have complex internal organization 117 kb 1 20 3 3 20 20 17 3 (Joint work with Jan-Fang Cheng, LBNL ) 118 kb
4. Combining ESP with other genome data Array Comparative Genomic Hybridization (a. CGH) Joint work with Z. Yakhini, D. Lipson (Agilent and Technion)
CGH Analysis • Divide genome into segments of equal copy number Copy number profile Genome coordinate
CGH Analysis • Divide genome into segments of equal copy number Copy number profile Genome coordinate Numerous methods (e. g. clustering, Hidden Markov Model, Bayesian, etc. ) Segmentation No information about: ESP! • Structural rearrangements (inversions, translocations) • Locations of duplicated material in tumor genome.
CGH Segmentation Copy number 5 3 2 Genome Coordinate How are the copies of segments linked? ? ? ES pairs links segments Tumor genome
ESP + CGH Copy number 5 3 2 Genome Coordinate CGH breakpoint ESP breakpoint ES near segment boundaries
ESP and CGH Breakpoints ESP breakpoints MCF 7 CGH breakpoints 730 256 39 (P = 1. 2 x 10 -4) ESP breakpoints BT 474 12/39 clusters CGH breakpoints 426 244 33 (P = 5. 4 x 10 -7) 8/33 clusters
Copy number Microdeletion in BT 474 ES pair 3 2 Valid ES < 250 kb 0 ≈ 600 kb “interesting” genes in this region
Combining ESP and CGH Copy number 5 3 2 Genome Coordinate ES pairs links segments. Copy number balance at each segment boundary: 5 = 2 + 3.
Combining ESP and CGH Copy number 5 3 ≤ f(e) ≤ 5 1 ≤ f(e) ≤ 4 3 2 1 ≤ f(e) ≤ 3 Genome Coordinate • CGH copy number not exact. • What genome architecture “most consistent” with ESP and CGH data?
Combining ESP and CGH Copy number 5 3 2 Genome Coordinate 3 ≤ f(e) ≤ 5 1 ≤ f(e) ≤ 3 1 ≤ f(e) ≤ 4 Build graph 1. 2. 3. Edge for each CGH segment. Edge for each ES pair consistent with segments. Range of copy number values for each CGH edge.
Network Flow Problem f(e) Flow constraints: l(e) ≤ f(e) ≤ u(e) Flow constraint on each CGH edge l(e) ≤ f(e) ≤ u(e) 8 e CGH edge: l(e) and u(e) from CGH ESP edge: l(e) = 1, u(e) = 1
Network Flow Problem f(e) Flow constraints: l(e) ≤ f(e) ≤ u(e) Flow in = flow out at each vertex l(e) ≤ f(e) ≤ u(e) 8 e (u, v) f( (u, v) ) = (v, w) f( v, w) ) 8 v CGH edge: l(e) and u(e) from CGH ESP edge: l(e) = 1, u(e) = 1
Network Flow Problem • Minimum Cost Circulation with Capacity Constraints (Sequencing by Hybridization, Sequence Assembly) Flow constraints: l(e) ≤ f(e) ≤ u(e) f(e) CGH edge: l(e) and u(e) from CGH ESP edge: l(e) = 1, u(e) = 1 Source/sink min e (e) Subject to: l(e) ≤ f(e) ≤ u(e) 8 e (u, v) f( (u, v) ) = (v, w) f( v, w) ) 8 v Costs: (e) = 0, e ESP or CGH edge 1, e incident to source/sink
Network Flow Results f(e) Source/sink • Unsatisfied flow are putative locations of missing ESP data. • Prioritize further sequencing. • Targeted ESP by screening library with CGH probes.
Network Flow Results • Identify amplified translocations – 14 in MCF 7 – 5 in BT 474 • Paths of high weight edges: amplicon structures Flow values → Edge weights
- Slides: 66