Structure Prediction Tertiary protein structure protein folding Three

  • Slides: 29
Download presentation
Structure Prediction

Structure Prediction

Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR)

Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2] Comparative modeling (based on homology) [3] Ab initio (de novo) prediction (Dr. Ingo Ruczinski at JHSPH)

Experimental approaches to protein structure [1] X-ray crystallography -- Used to determine 80% of

Experimental approaches to protein structure [1] X-ray crystallography -- Used to determine 80% of structures -- Requires high protein concentration -- Requires crystals -- Able to trace amino acid side chains -- Earliest structure solved was myoglobin [2] NMR -- Magnetic field applied to proteins in solution -- Largest structures: 350 amino acids (40 k. D) -- Does not require crystallization

Steps in obtaining a protein structure Target selection Obtain, characterize protein Determine, refine, model

Steps in obtaining a protein structure Target selection Obtain, characterize protein Determine, refine, model the structure Deposit in database

X-ray crystallography http: //en. wikipedia. org/wiki/X-ray_diffraction Sperm Whale Myoglobin

X-ray crystallography http: //en. wikipedia. org/wiki/X-ray_diffraction Sperm Whale Myoglobin

PDB New PDB structures • April 08, 2008 – 50, 000 proteins, 25 new

PDB New PDB structures • April 08, 2008 – 50, 000 proteins, 25 new experimentally determined structures each day Old folds New folds

Example 1 wey

Example 1 wey

Ab initio protein prediction • Starts with an attempt to derive secondary structure from

Ab initio protein prediction • Starts with an attempt to derive secondary structure from the amino acid sequence – Predicting the likelihood that a subsequence will fold into an alphahelix, beta-sheet, or coil, using physicochemical parameters or HMMs and ANNs – Able to accurately predict 3/4 of all local structures

Structure Characteristics

Structure Characteristics

Beta Sheets

Beta Sheets

Ab Inito Prediction

Ab Inito Prediction

Secondary structure prediction Chou and Fasman (1974) developed an algorithm based on the frequencies

Secondary structure prediction Chou and Fasman (1974) developed an algorithm based on the frequencies of amino acids found in a helices, b-sheets, and turns. Proline: occurs at turns, but not in a helices. GOR (Garnier, Osguthorpe, Robson): related algorithm Modern algorithms: use multiple sequence alignments and achieve higher success rate (about 70 -75%) Page 279 -280

Table

Table

Frequency Domain

Frequency Domain

Neural Networks

Neural Networks

Training the Network • Use PDB entries with validated secondary structures • Measures of

Training the Network • Use PDB entries with validated secondary structures • Measures of accuracy – Q 3 Score percentage of protein correctly predicted (trains to predicting the most abundant structure) – You get 50% if you just predict everything to be a coil – Most methods get around 60% with this metric

Correlation Coeficient • How correlated are the predictions for coils, helix and Beta-sheets to

Correlation Coeficient • How correlated are the predictions for coils, helix and Beta-sheets to the real structures • This ignores what we really want to get to – If the real structure has 3 coils, do we predict 3 coils? • Segment overlap score (Sov) gives credit to how protein like the structure is, but it is correlated with Q 3

Fold recognition (structural profiles) • Attempts to find the best fit of a raw

Fold recognition (structural profiles) • Attempts to find the best fit of a raw polypeptide sequence onto a library of known protein folds • A prediction of the secondary structure of the unknown is made and compared with the secondary structure of each member of the library of folds

Threading • Takes the fold recognition process a step further: – Empirical-energy functions for

Threading • Takes the fold recognition process a step further: – Empirical-energy functions for residue pair interactions are used to mount the unknown onto the putative backbone in the best possible manner

Fold recognition by threading Fold 1 Fold 2 Fold 3 Query sequence Compatibility scores

Fold recognition by threading Fold 1 Fold 2 Fold 3 Query sequence Compatibility scores Fold N

CASP • http: //www. predictioncenter. org/casp 8/index. cgi

CASP • http: //www. predictioncenter. org/casp 8/index. cgi

SCOP • SCOP: Structural Classification of Proteins. • http: //scop. mrc-lmb. cam. ac. uk/scop/

SCOP • SCOP: Structural Classification of Proteins. • http: //scop. mrc-lmb. cam. ac. uk/scop/

CATH • CATH: Protein Structure Classification • Class (C), Architecture (A), Topology (T) and

CATH • CATH: Protein Structure Classification • Class (C), Architecture (A), Topology (T) and Homologous superfamily (H)