Protein Structure Space Patrice Koehl Computer Science and






































- Slides: 38
Protein Structure Space Patrice Koehl Computer Science and Genome Center http: //www. cs. ucdavis. edu/~koehl/
From Sequence to Function Structure Sequence Function KKAVINGEQIRSISDLHQTLKK WELALPEYYGENLDALWDCLTG VEYPLVLEWRQFEQSKQLTENG AESVLQVFREAKAEGCDITI ligand
Protein Structure Space 1 CTF 68 AA 1 TIM 247 AA 1 A 1 O 384 AA 1 K 3 R 268 AA 1 NIK 4504 AA 1 AON 8337 AA
Outline • Protein Structure Space Dimension? • Protein Shape Descriptors Differential Geometry Tools • Complexity of Protein Structures Are Proteins 3 D, or 1 D objects? • Classifying Proteins The Shapes of Protein Structures
Outline • Protein Structure Space Dimension? • Protein Shape Descriptors Differential Geometry Tools • Complexity of Protein Structures Are Proteins 3 D, or 1 D objects? • Classifying Proteins The Shapes of Protein Structures
Classification of Protein Structure: CATH Alpha Mixed Alpha Beta C Barrel Sandwich Super Roll A Tim Barrel T Other Barrel
Protein Structure Space Test set 2, 930 proteins out of 23, 000 proteins in PDB No sequence similarity (Fasta E-value < e-4) Reference structural similarity defined from CATH 769 folds 104, 000 pairs of similar structures out of 4, 600, 000 pairs Performance measure: ROC curve (Receiver Operating Characteristic)
Projecting Protein Structure Space X Distance Matrix Metric Matrix Points in Space
Projecting Protein Structure Space Class lk k Fold lk k
Protein Structure Similarity Root mean square distance: c. RMS: N: number of equivalent atoms between A and B R, T: rigid transformation that minimizes c. RMS.
Protein Structure Classes Measure of Structure Similarity: c. RMS after Optimal Superposition (Structal) Eigenvalues of the Metric Matrix:
A Picture of the Protein Structure Space b Proteins α and b Proteins a Proteins
A Picture of the Protein Structure Space 1 rep. C 2 1 bdo 00 1 a 81 G 2 2 bi 6 H 0 b Proteins α and b Proteins a Proteins 1 sfc. K 0
Outline • Protein Structure Space Dimension? • Protein Shape Descriptors Differential Geometry Tools • Complexity of Protein Structures Are Proteins 3 D, or 1 D objects? • Classifying Proteins The Shapes of Protein Structures
Protein Fold Space ROC Analysis (Receiver Operating Characteristic) Rate of true positives (%) 100 90 “Perfect” measure Area = 1. 0 80 70 60 50 40 30 Random measure Area = 0. 5 20 10 20 30 40 50 60 70 80 90 100 Rate of true negatives (%)
Protein Fold Space ROC Analysis (Receiver Operating Characteristic) True positives pairs of proteins that belong to the same T class of CATH True negatives pairs of proteins that belong to the same C class, but not the same T class.
Protein Fold Space Rate of true positives (%) CATH Fold 20 : 0. 98 Fasta: 0. 54 CATH Class : 0. 51 Rate of true negatives (%) Fold 20: first 20 coordinates derived from the CATH fold matrix CATH class: first 3 coordinates derived from the CATH class matrix
Rate of true positives (%) Protein Fold Space Structal: 0. 88 Fasta: 0. 54 Rate of true negatives (%)
Protein Structure Features y x Global radius of curvature: R(x, y, z) z Thickness: (Gonzalez & Maddocks, PNAS, 1999, 96: 4769)
Thickness of a protein structure D = 2. 60 Ǻ
Curvature Feature Vector
Performance of the Curvature Feature Vector Rate of true positives (%) Structal: 0. 88 C 5: 0. 65 Curvature vector performs better than fasta. Fasta: 0. 54 Rate of true negatives (%) Needs more features to match Structal.
Protein Structure Features: Writhing Sign of Crossing + - Writhing Number g(t 1) 1 g(t 2) Writhe Feature Vector for Each Protein Fain and Røgen, PNAS, 100: 119 (2003)
Rate of true positives (%) Protein Structure Features: Writhing Structal: 0. 88 W 10: 0. 77 C 5: 0. 65 Fasta: 0. 54 Rate of true negatives (%) W 10 Writhe performs better than C 5 Curvature
Outline • Protein Structure Space Dimension? • Protein Shape Descriptors Differential Geometry Tools • Complexity of Protein Structures Are Proteins 3 D, or 1 D objects? • Classifying Proteins The Shapes of Protein Structures
Clustering Protein Fragments to Extract a Small Set of Representatives (a Library) data clustered data library (Simulated annealing K means)
Generating an approximate structure A Fragment library B C D
Generating an approximate structure A Fragment library B C D
Generating an approximate structure A Fragment library B C D
Generating an approximate structure A Fragment library B C D
Generating an approximate structure A B D C Fragment library Structural Sequence: AC
Fitting Protein Structures better 50 fragments of length 7 2. 78 Ǻ c. RMS 100 fragments of length 5 0. 91 Ǻ c. RMS
Longer fragments give better fit at same complexity Average c. RMS distance N: number of fragments L: size of each fragment Fragment Size: 7 residues 6 residues 5 residues 4 residues Complexity(states/residue) (Kolodny, Koehl, Guibas, Levitt, J. Mol. Biol. , 323, 297 2002)
Choosing the “right” library Size L 7 N such that Complexity=20 160000 6 8000 5 400 4 20
A Structural Alphabet for Protein Backbone Protein size # of structures Fragment size: 4 Number of fragment: 20 0. 2 0. 6 1. 0 c. RMS model-experimental structure
Structural Alphabet: Application to Structure Comparison c. RMS = 1Å
Collaborators • Marc Delarue (Biophysics) Institut Pasteur, Paris • Michael Levitt (Computational Biology) Stanford University • Herbert Edelsbrunner (Math/Computer Science) Duke University • Rachel Kolodny (Computer Science) Columbia University • Peter Roegen (Math) DTU, Denmark • Joel Hass (Math) UC Davis
Thank You