Protein Structure Space Patrice Koehl Computer Science and

  • Slides: 38
Download presentation
Protein Structure Space Patrice Koehl Computer Science and Genome Center http: //www. cs. ucdavis.

Protein Structure Space Patrice Koehl Computer Science and Genome Center http: //www. cs. ucdavis. edu/~koehl/

From Sequence to Function Structure Sequence Function KKAVINGEQIRSISDLHQTLKK WELALPEYYGENLDALWDCLTG VEYPLVLEWRQFEQSKQLTENG AESVLQVFREAKAEGCDITI ligand

From Sequence to Function Structure Sequence Function KKAVINGEQIRSISDLHQTLKK WELALPEYYGENLDALWDCLTG VEYPLVLEWRQFEQSKQLTENG AESVLQVFREAKAEGCDITI ligand

Protein Structure Space 1 CTF 68 AA 1 TIM 247 AA 1 A 1

Protein Structure Space 1 CTF 68 AA 1 TIM 247 AA 1 A 1 O 384 AA 1 K 3 R 268 AA 1 NIK 4504 AA 1 AON 8337 AA

Outline • Protein Structure Space Dimension? • Protein Shape Descriptors Differential Geometry Tools •

Outline • Protein Structure Space Dimension? • Protein Shape Descriptors Differential Geometry Tools • Complexity of Protein Structures Are Proteins 3 D, or 1 D objects? • Classifying Proteins The Shapes of Protein Structures

Outline • Protein Structure Space Dimension? • Protein Shape Descriptors Differential Geometry Tools •

Outline • Protein Structure Space Dimension? • Protein Shape Descriptors Differential Geometry Tools • Complexity of Protein Structures Are Proteins 3 D, or 1 D objects? • Classifying Proteins The Shapes of Protein Structures

Classification of Protein Structure: CATH Alpha Mixed Alpha Beta C Barrel Sandwich Super Roll

Classification of Protein Structure: CATH Alpha Mixed Alpha Beta C Barrel Sandwich Super Roll A Tim Barrel T Other Barrel

Protein Structure Space Test set 2, 930 proteins out of 23, 000 proteins in

Protein Structure Space Test set 2, 930 proteins out of 23, 000 proteins in PDB No sequence similarity (Fasta E-value < e-4) Reference structural similarity defined from CATH 769 folds 104, 000 pairs of similar structures out of 4, 600, 000 pairs Performance measure: ROC curve (Receiver Operating Characteristic)

Projecting Protein Structure Space X Distance Matrix Metric Matrix Points in Space

Projecting Protein Structure Space X Distance Matrix Metric Matrix Points in Space

Projecting Protein Structure Space Class lk k Fold lk k

Projecting Protein Structure Space Class lk k Fold lk k

Protein Structure Similarity Root mean square distance: c. RMS: N: number of equivalent atoms

Protein Structure Similarity Root mean square distance: c. RMS: N: number of equivalent atoms between A and B R, T: rigid transformation that minimizes c. RMS.

Protein Structure Classes Measure of Structure Similarity: c. RMS after Optimal Superposition (Structal) Eigenvalues

Protein Structure Classes Measure of Structure Similarity: c. RMS after Optimal Superposition (Structal) Eigenvalues of the Metric Matrix:

A Picture of the Protein Structure Space b Proteins α and b Proteins a

A Picture of the Protein Structure Space b Proteins α and b Proteins a Proteins

A Picture of the Protein Structure Space 1 rep. C 2 1 bdo 00

A Picture of the Protein Structure Space 1 rep. C 2 1 bdo 00 1 a 81 G 2 2 bi 6 H 0 b Proteins α and b Proteins a Proteins 1 sfc. K 0

Outline • Protein Structure Space Dimension? • Protein Shape Descriptors Differential Geometry Tools •

Outline • Protein Structure Space Dimension? • Protein Shape Descriptors Differential Geometry Tools • Complexity of Protein Structures Are Proteins 3 D, or 1 D objects? • Classifying Proteins The Shapes of Protein Structures

Protein Fold Space ROC Analysis (Receiver Operating Characteristic) Rate of true positives (%) 100

Protein Fold Space ROC Analysis (Receiver Operating Characteristic) Rate of true positives (%) 100 90 “Perfect” measure Area = 1. 0 80 70 60 50 40 30 Random measure Area = 0. 5 20 10 20 30 40 50 60 70 80 90 100 Rate of true negatives (%)

Protein Fold Space ROC Analysis (Receiver Operating Characteristic) True positives pairs of proteins that

Protein Fold Space ROC Analysis (Receiver Operating Characteristic) True positives pairs of proteins that belong to the same T class of CATH True negatives pairs of proteins that belong to the same C class, but not the same T class.

Protein Fold Space Rate of true positives (%) CATH Fold 20 : 0. 98

Protein Fold Space Rate of true positives (%) CATH Fold 20 : 0. 98 Fasta: 0. 54 CATH Class : 0. 51 Rate of true negatives (%) Fold 20: first 20 coordinates derived from the CATH fold matrix CATH class: first 3 coordinates derived from the CATH class matrix

Rate of true positives (%) Protein Fold Space Structal: 0. 88 Fasta: 0. 54

Rate of true positives (%) Protein Fold Space Structal: 0. 88 Fasta: 0. 54 Rate of true negatives (%)

Protein Structure Features y x Global radius of curvature: R(x, y, z) z Thickness:

Protein Structure Features y x Global radius of curvature: R(x, y, z) z Thickness: (Gonzalez & Maddocks, PNAS, 1999, 96: 4769)

Thickness of a protein structure D = 2. 60 Ǻ

Thickness of a protein structure D = 2. 60 Ǻ

Curvature Feature Vector

Curvature Feature Vector

Performance of the Curvature Feature Vector Rate of true positives (%) Structal: 0. 88

Performance of the Curvature Feature Vector Rate of true positives (%) Structal: 0. 88 C 5: 0. 65 Curvature vector performs better than fasta. Fasta: 0. 54 Rate of true negatives (%) Needs more features to match Structal.

Protein Structure Features: Writhing Sign of Crossing + - Writhing Number g(t 1) 1

Protein Structure Features: Writhing Sign of Crossing + - Writhing Number g(t 1) 1 g(t 2) Writhe Feature Vector for Each Protein Fain and Røgen, PNAS, 100: 119 (2003)

Rate of true positives (%) Protein Structure Features: Writhing Structal: 0. 88 W 10:

Rate of true positives (%) Protein Structure Features: Writhing Structal: 0. 88 W 10: 0. 77 C 5: 0. 65 Fasta: 0. 54 Rate of true negatives (%) W 10 Writhe performs better than C 5 Curvature

Outline • Protein Structure Space Dimension? • Protein Shape Descriptors Differential Geometry Tools •

Outline • Protein Structure Space Dimension? • Protein Shape Descriptors Differential Geometry Tools • Complexity of Protein Structures Are Proteins 3 D, or 1 D objects? • Classifying Proteins The Shapes of Protein Structures

Clustering Protein Fragments to Extract a Small Set of Representatives (a Library) data clustered

Clustering Protein Fragments to Extract a Small Set of Representatives (a Library) data clustered data library (Simulated annealing K means)

Generating an approximate structure A Fragment library B C D

Generating an approximate structure A Fragment library B C D

Generating an approximate structure A Fragment library B C D

Generating an approximate structure A Fragment library B C D

Generating an approximate structure A Fragment library B C D

Generating an approximate structure A Fragment library B C D

Generating an approximate structure A Fragment library B C D

Generating an approximate structure A Fragment library B C D

Generating an approximate structure A B D C Fragment library Structural Sequence: AC

Generating an approximate structure A B D C Fragment library Structural Sequence: AC

Fitting Protein Structures better 50 fragments of length 7 2. 78 Ǻ c. RMS

Fitting Protein Structures better 50 fragments of length 7 2. 78 Ǻ c. RMS 100 fragments of length 5 0. 91 Ǻ c. RMS

Longer fragments give better fit at same complexity Average c. RMS distance N: number

Longer fragments give better fit at same complexity Average c. RMS distance N: number of fragments L: size of each fragment Fragment Size: 7 residues 6 residues 5 residues 4 residues Complexity(states/residue) (Kolodny, Koehl, Guibas, Levitt, J. Mol. Biol. , 323, 297 2002)

Choosing the “right” library Size L 7 N such that Complexity=20 160000 6 8000

Choosing the “right” library Size L 7 N such that Complexity=20 160000 6 8000 5 400 4 20

A Structural Alphabet for Protein Backbone Protein size # of structures Fragment size: 4

A Structural Alphabet for Protein Backbone Protein size # of structures Fragment size: 4 Number of fragment: 20 0. 2 0. 6 1. 0 c. RMS model-experimental structure

Structural Alphabet: Application to Structure Comparison c. RMS = 1Å

Structural Alphabet: Application to Structure Comparison c. RMS = 1Å

Collaborators • Marc Delarue (Biophysics) Institut Pasteur, Paris • Michael Levitt (Computational Biology) Stanford

Collaborators • Marc Delarue (Biophysics) Institut Pasteur, Paris • Michael Levitt (Computational Biology) Stanford University • Herbert Edelsbrunner (Math/Computer Science) Duke University • Rachel Kolodny (Computer Science) Columbia University • Peter Roegen (Math) DTU, Denmark • Joel Hass (Math) UC Davis

Thank You

Thank You