Structure and Motion JeanClaude Latombe Computer Science Department

  • Slides: 53
Download presentation
Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University NSF-ITR Meeting on November

Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University NSF-ITR Meeting on November 14, 2002

Stanford’s Participants PI’s: L. Guibas, J. C. Latombe, M. Levitt Research Associate: P. Koehl

Stanford’s Participants PI’s: L. Guibas, J. C. Latombe, M. Levitt Research Associate: P. Koehl Postdocs: F. Schwarzer, A. Zomorodian Graduate students: S. Apaydin (EE), S. Ieong (CS), R. Kolodny (CS), I. Lotan (CS), A. Nguyen (Sc. Comp. ), D. Russel (CS), R. Singh (CS), C. Varma (CS) § Undergraduate students: J. Greenberg (CS), E. Berger (CS) § Collaborating faculty: § § § § § A. Brunger (Molecular & Cellular Physiology) D. Brutlag (Biochemistry) D. Donoho (Statistics) J. Milgram (Math) V. Pande (Chemistry)

Problems Addressed Biological functions derive from the structures (shapes) achieved by molecules through motions

Problems Addressed Biological functions derive from the structures (shapes) achieved by molecules through motions Determination, classification, and prediction of 3 D protein structures Modeling of molecular energy and simulation of folding and binding motion

What’s New for Computer Science? Massive amount of experimental data Importance of similarities Multiple

What’s New for Computer Science? Massive amount of experimental data Importance of similarities Multiple representations of structure Continuous energy functions Many objects forming deformable chains Many degrees of freedom Ensemble properties of pathways

Massive amount of experimental data Abstract/simplify data sets into compact data structures E. g.

Massive amount of experimental data Abstract/simplify data sets into compact data structures E. g. : Electron density map Medial axis

Importance of similarities Segmentation/matching/scoring techniques E. g. : Libraries of protein fragments [Kolodny, Koehl,

Importance of similarities Segmentation/matching/scoring techniques E. g. : Libraries of protein fragments [Kolodny, Koehl, Guibas, Levitt, JMB (2002)] data set clustered data small library

1 tim Approximations real protein Complexity 2. 26 (50 fragments of length 7) 2.

1 tim Approximations real protein Complexity 2. 26 (50 fragments of length 7) 2. 7805 A c. RMS Complexity 10 (100 fragments of length 5) 0. 9146 A c. RMS

Alignment of Structural Motifs [Singh and Saha; Kolodny and Linial] Problem: § Determine if

Alignment of Structural Motifs [Singh and Saha; Kolodny and Linial] Problem: § Determine if two structures share common motifs: • 2 (labelled) structures in R 3 • A={a 1, a 2, …, an}, B={b 1, b 2, …, bm} Find subsequences sa and sb s. t the substructures {as (1), as (2), …, as (l)} {bs (1), bs (2), …, bs (l)} are similar a b a a b b § Twofold problem: alignment and correspondence § Score Approximation Complexity

[R. Singh and M. Saha. Identifying Structural Motifs in Proteins. Pacific Symp. on Biocomputing,

[R. Singh and M. Saha. Identifying Structural Motifs in Proteins. Pacific Symp. on Biocomputing, Jan. 2003. ] Iterative Closest Point (Besl-Mc. Kay) for alignment: Score: RMSD distance

[R. Singh and M. Saha. Identifying Structural Motifs in Proteins. Pacific Symp. on Biocomputing,

[R. Singh and M. Saha. Identifying Structural Motifs in Proteins. Pacific Symp. on Biocomputing, Jan. 2003. ] Trypsin active site

[R. Singh and M. Saha. Identifying Structural Motifs in Proteins. Pacific Symp. on Biocomputing,

[R. Singh and M. Saha. Identifying Structural Motifs in Proteins. Pacific Symp. on Biocomputing, Jan. 2003. ] Trypsin active site against 42 Trypsin like proteins

Multiple representations of structure Pro. Shape software [Koehl, Levitt (Stanford), Edelsbrunner (Duke)]

Multiple representations of structure Pro. Shape software [Koehl, Levitt (Stanford), Edelsbrunner (Duke)]

Statistical potentials for proteins based on alpha complex [Guibas, Koehl, Zomorodian] § Decoys generated

Statistical potentials for proteins based on alpha complex [Guibas, Koehl, Zomorodian] § Decoys generated using “physical” potentials § §Select best decoys using distance information

Continuous energy functions Many objects in deformable chains Many pairs of objects, but relatively

Continuous energy functions Many objects in deformable chains Many pairs of objects, but relatively few are close enough to interact During motion simulation - detect steric clashes (self-collisions) - find pairs of atoms closer than cutoff Data structures that capture proximity, but undergo small or rare changes

Other application domains: § Modular reconfigurable robots § Reconstructive surgery

Other application domains: § Modular reconfigurable robots § Reconstructive surgery

§ Fixed Bounding-Volume hierarchies don’t work sec 17

§ Fixed Bounding-Volume hierarchies don’t work sec 17

§ Instead, exploit what doesn’t change: chain topology Adaptive BV hierarchies [Guibas, Nguyen, Russel,

§ Instead, exploit what doesn’t change: chain topology Adaptive BV hierarchies [Guibas, Nguyen, Russel, Zhang] [Lotan, Schwarzer, Halperin, Latombe] (SOCG’ 02) sec 17

Wrapped bounding sphere hierarchies [Guibas, Nguyen, Russel, Zhang] (So. CG 2002) • WBSH undergoes

Wrapped bounding sphere hierarchies [Guibas, Nguyen, Russel, Zhang] (So. CG 2002) • WBSH undergoes small number of changes • Self-collision: O(n logn ) in R 2 O(n 2 -2/d) in Rd, d 3

Chain. Trees [Lotan, Schwarzer, Halperin, Latombe] (So. CG’ 02) Assumption: Few degrees of freedom

Chain. Trees [Lotan, Schwarzer, Halperin, Latombe] (So. CG’ 02) Assumption: Few degrees of freedom change at each motion step (e. g. , Monte Carlo simulation) § Find all pairs of atoms closer than a given cutoff § Find which energy terms can be reused

Chain. Trees [Lotan, Schwarzer, Halperin, Latombe] (So. CG’ 02) Updating: Finding interacting pairs: (in

Chain. Trees [Lotan, Schwarzer, Halperin, Latombe] (So. CG’ 02) Updating: Finding interacting pairs: (in practice, sublinear)

Chain. Trees Application to MC simulation (comparison to grid method) m=1 (68) (144) (374)

Chain. Trees Application to MC simulation (comparison to grid method) m=1 (68) (144) (374) (755) m=5 (68) (144) (374) (755)

Future work: Chain. Trees § Run new series of experiments with more complex energy

Future work: Chain. Trees § Run new series of experiments with more complex energy field: EEF 1 [Lazaridis & Karplus] (with Pande) § Use library of fragments (with Koehl) Open problem: How to find good moves to make when the conformation is compact and random moves are rejected with high probability?

Future Work: Spanner for deformable chain [Agarwal, Gao, Duke; Nguyen, Zhang, Stanford] 3 HVT

Future Work: Spanner for deformable chain [Agarwal, Gao, Duke; Nguyen, Zhang, Stanford] 3 HVT Capture proximity information with a sparse spanner

Many degrees of freedom Tools to explore large dimensional conformation space: - Sampling strategies

Many degrees of freedom Tools to explore large dimensional conformation space: - Sampling strategies - Nearest neighbors

Sampling structures by combining fragments [Kolodny, Levitt] Library of protein fragments a b c

Sampling structures by combining fragments [Kolodny, Levitt] Library of protein fragments a b c d Discrete set of candidate structures bbc cab

Nearest neighbors in high-dimensional space [Lotan and Schwarzer] Find k nearest neighbors of a

Nearest neighbors in high-dimensional space [Lotan and Schwarzer] Find k nearest neighbors of a given protein conformation in a set of n conformations (c. RMS, d. RMS) Idea: Cut backbone into m equal subsequences a 0 a 3 a 1 a 2 a 4 a 5 a 6 am

Nearest neighbors in high-dimensional space [Lotan and Schwarzer] 100, 000 decoys of 1 CTF

Nearest neighbors in high-dimensional space [Lotan and Schwarzer] 100, 000 decoys of 1 CTF (Park-Levitt set) Computation of 100 NN of each conformation Full rep. , d. RMS (brute force) Ave. rep. , d. RMS (brute force) : SVD red. rep. , d. RMS (brute force) SVD red. rep. , d. RMS (kd-tree) ~84 h ~4. 8 h 41 min 19 min ~80% of computed NNs are true NNs kd-tree software from ANN library (U. Maryland)

Ensemble properties of pathways Stochastic nature of molecular motion requires characterizing average properties of

Ensemble properties of pathways Stochastic nature of molecular motion requires characterizing average properties of many pathways

Example #1: Probability of Folding pfold HIV integrase [Du et al. ‘ 98] 1

Example #1: Probability of Folding pfold HIV integrase [Du et al. ‘ 98] 1 - pfold “We stress that we do not suggest using pfold as a transition coordinate for practical purposes as it is Folded set Unfolded set very computationally intensive. ” Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition Coordinate for Protein Folding” Journal of Chemical Physics (1998).

Example #2: Ligand-Protein Interaction [Sept, Elcock and Mc. Cammon `99] 10 K to 30

Example #2: Ligand-Protein Interaction [Sept, Elcock and Mc. Cammon `99] 10 K to 30 K independent simulations

Probabilistic Roadmap [Apaydin, Brutlag, Hsu, Guestrin, Latombe] (RECOMB’ 02, ECCB’ 02) Idea: Capture the

Probabilistic Roadmap [Apaydin, Brutlag, Hsu, Guestrin, Latombe] (RECOMB’ 02, ECCB’ 02) Idea: Capture the stochastic nature of molecular motion by a network of randomly selected conformations and by assigning probabilities to edges vi Pij vj

Probabilistic Roadmap [Apaydin, Brutlag, Hsu, Guestrin, Latombe] (RECOMB’ 02, ECCB’ 02) U: Unfolded set

Probabilistic Roadmap [Apaydin, Brutlag, Hsu, Guestrin, Latombe] (RECOMB’ 02, ECCB’ 02) U: Unfolded set § § § F: Folded set One linear equation per node Solution gives pfold for all nodes l k No explicit simulation run j Pik Pil All pathways are taken Pij into account m Pim Sparse linear system i Pii Let fi = pfold(i) After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm =1 =1

Probabilistic Roadmap • 1 ROP (repressor of primer) • 2 a helices • 6

Probabilistic Roadmap • 1 ROP (repressor of primer) • 2 a helices • 6 DOF Correlation with MC Approach

Probabilistic Roadmap Computation Times (1 ROP) Monte Carlo: 49 conformations Over 11 days of

Probabilistic Roadmap Computation Times (1 ROP) Monte Carlo: 49 conformations Over 11 days of computer time Over 106 energy computations 1 - 1. 5 hours of computer time ~15, 000 energy computations Roadmap: 5000 conformations ~4 orders of magnitude speedup!

Future work: Probabilistic Roadmap § Non-uniform sampling strategies § Encoding molecular dynamics into probabilistic

Future work: Probabilistic Roadmap § Non-uniform sampling strategies § Encoding molecular dynamics into probabilistic roadmaps (with V. Pande) § Quantitative experiments with ligand-protein binding (with V. Pande)

Bio-X – Clark Center

Bio-X – Clark Center

The following slides relate to non-research issues. I do not plan to present them.

The following slides relate to non-research issues. I do not plan to present them. Jack and Leo may want to use the contents of some of them for their own presentations.

Education • Tutorial on Delaunay, Alpha-Shape and Pockets (Koehl) • A biocomputing Notebook (Koehl)

Education • Tutorial on Delaunay, Alpha-Shape and Pockets (Koehl) • A biocomputing Notebook (Koehl) • Biocomputation lectures in pre-existing classes: – CS 326 – motion planning: molecular motion, probabilistic roadmaps, self-collision detection (Latombe) – CS 468 – intro to computational topology: finding pockets and tunnels in molecules, compute surface areas and volumes and their derivative (Zomorodian) • New class on Algorithmic Biology (Batzoglu, Guibas, Latombe) • Graduate Curriculum Committee, Bio-Engineering Dept. , Stanford (Latombe)

Trained Students (1/2) Ph. D students Serkan Apaydin, EE An Nguyen, Scientific Computing Carlos

Trained Students (1/2) Ph. D students Serkan Apaydin, EE An Nguyen, Scientific Computing Carlos Guestrin, CS (Daphne Koller’s group) Itay Lotan, CS Rachel Kolodny, CS Daniel Russel, CS Samuel Ieong, CS Most graduate students have a principal advisor in CS and a secondaryone in a bio-related department (Levitt, Brutlag, Pande)

Trained Students (2/2) Graduated Master students Rohit Singh, finding motifs in proteins, best Stanford

Trained Students (2/2) Graduated Master students Rohit Singh, finding motifs in proteins, best Stanford CS master’s thesis, June ’ 02 [current position: bioinformatics company in San Diego] Chris Varma, study of ligand-protein interaction with probabilistic roadmaps, June ’ 02 [current position: Ph. D student, Harvard/MIT Biomedical program] Current Master student Ben Wong, modeling T cell activity Undergraduate Eric Berger, CS, Stanford, summer internship Julie Greeberg, CS, Harvard, summer internship

Visitors • Prof. Alberto Munoz Math Dept. , University of Yucatan, Mexico 3 months,

Visitors • Prof. Alberto Munoz Math Dept. , University of Yucatan, Mexico 3 months, Summer’ 02 Haptic interaction and probabilistic roadmaps • Prof. Ileana Streinu Smith College 6 months, from Sept. ’ 02 Protein folding

Interactions Within Stanford - Guibas and Levitt, with J. Milgram (Math): topology of configuration

Interactions Within Stanford - Guibas and Levitt, with J. Milgram (Math): topology of configuration spaces of chains - Guibas, with V. Pande (Chemistry) and D. Donoho (Statistics) non-linear multi-resolution analysis of molecular motions - Latombe and Apaydin, with D. Brutlag (Biochemistry) and V. Pande: probabilistic roadmaps - Latombe and Lotan with V. Pande: efficient MC simulation

Interactions Outside Stanford - Collision Detection for Deforming Necklaces, P. Agarwal, L. Guibas, A.

Interactions Outside Stanford - Collision Detection for Deforming Necklaces, P. Agarwal, L. Guibas, A. Nguyen, D. Russel, and L. Zhang. Invited to special issue of Comp. Geom. , Theory and Applications, following presentation at So. CG'02. - Kinetic Medians and kd-Trees, P. Agarwal, J. Gao, and L. Guibas. Proc. 10 th European Symp. Algorithms, LNCS 2461, Springer-Verlag, 5 -16, 2002. - Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion, M. S. Apaydin, D. L. Brutlag, C. Guestrin, D. Hsu, and J. C. Latombe. Proc. RECOMB'02, Washington D. C. , pp. 12 -21, 2002. - Efficient Maintenance and Self-Collision testing for Kinematic Chains, I. Lotan, F. Schwarzer, D. Halperin, and J. C. Latombe, So. CG’ 02, pp. 43 -42. June 2002. - Stochastic Conformational Roadmaps for Computing Ensemble Properties of Molecular Motion, M. S. Apaydin, D. L. Brutlag, C. Guestrin, D. Hsu, and J. C. Latombe. Workshop on Algorithmic Foundations of Robotics (WAFR), Nice, Dec. 2002.

Attendance to Conferences - BCATS ‘ 01 and ‘ 02 [Bio-Computation At Stanford] -

Attendance to Conferences - BCATS ‘ 01 and ‘ 02 [Bio-Computation At Stanford] - RECOMB ’ 02 [Int. Conf. on Research in Computational Biology] - ISMB ‘ 02 [Int. Conf. on Intelligent Syst. for Molecular Biology] - ECCB 2002 [European Conf. on Computational Biology] - Biophysical Society Symp. on Molecular Simulations in Structural Biology, 2002 - So. CG 2002 [ACM Symp. on Computational Heometry]

Outreach - Latombe and Levitt serve as members of the Scientific Leadership Council of

Outreach - Latombe and Levitt serve as members of the Scientific Leadership Council of Stanford’s Bio-X program - Presentations: Stanford’s Bio-X Symposium (3/02), Stanford’s Computer Forum (3/02), Berkeley’s Broad Area Seminar (4/02) - Conference committees: Guibas, program committee, WAFR’ 02 and So. CG’ 03 Latombe, program committee, 1 st IEEE Bioinformatics Conf. ‘ 03 Apaydin, organization committee of BCATS’ 02

The following slides are extra slides that I removed from my presentation for lack

The following slides are extra slides that I removed from my presentation for lack of time

General Goals Larger proteins considered computational efficiency Diversity of molecules and interactions computational abstractions

General Goals Larger proteins considered computational efficiency Diversity of molecules and interactions computational abstractions Extension of in-silico experiments computational correctness Enable biological studies that were not possible before, more systematically

Approach Select hard problems Close interaction between computer scientists (Guibas, Koehl, Latombe) and biologists

Approach Select hard problems Close interaction between computer scientists (Guibas, Koehl, Latombe) and biologists (Koehl, Levitt, Brutlag, Pande, Brunger) Most graduate students are CS students with secondary advisor in biology Perform extensive tests

Electron density map Medial axis [Guibas, Brunger, Russel] § Medial axis of iso-surfaces to

Electron density map Medial axis [Guibas, Brunger, Russel] § Medial axis of iso-surfaces to estimate backbone § Cleaning and simplification of axis to filter noise out § Persistence of features across multiple iso-surfaces sec 17

Continuous energy function Essential for protein structure prediction and molecular motion simulation: - Statistical

Continuous energy function Essential for protein structure prediction and molecular motion simulation: - Statistical potentials based on alpha complex - Maintenance of energy values during simulation

§ Instead, exploit what doesn’t change: chain topology Adaptive BV hierarchies üBalanced binary trees

§ Instead, exploit what doesn’t change: chain topology Adaptive BV hierarchies üBalanced binary trees of constant topology üEfficient repair of position/size of BVs [Guibas, Nguyen, Russel, Zhang] [Lotan, Schwarzer, Halperin, Latombe] (SOCG’ 02) sec 17

Future Work: Spanner for deformable chain [Agarwal, Gao, Duke; Nguyen, Zhang, Stanford]

Future Work: Spanner for deformable chain [Agarwal, Gao, Duke; Nguyen, Zhang, Stanford]

Probabilistic Roadmap • 1 ROP (repressor of primer) • 2 a helices • 6

Probabilistic Roadmap • 1 ROP (repressor of primer) • 2 a helices • 6 DOF • 1 HDD (Engrailed homeodomain) • 3 a helices • 12 DOF H-P energy model with steric clash exclusion [Sun et al. , 95]