Protein structure prediction Protein folds Fold definition two

Protein folds. • Fold definition: two folds are similar if they have a similar

Protein structure prediction flowchart Protein sequence Does sequence align with a protein of known

Protein structure prediction. Prediction of three-dimensional structure of a protein from its sequence. Different

Homology modeling. Aims to produce protein models with accuracy close to experimental and is

Steps of homology modeling. 1. 2. 3. 4. 5. Template recognition & initial alignment.

1. Template recognition. Recognition of similarity between the target and template. Target – protein

Two zones of protein structure prediction. Sequence identity 100 Homology modeling zone 50 Fold

2. Backbone generation. If alignment between target and template is ready, copy the backbone

3. Insertions and deletions. insertion AHYATPTTT AH---TPSS deletion Occur mostly between secondary structures, in

4. Side chain modeling. Side chain conformations – rotamers. In similar proteins side chains

5. Model optimization. Energy optimization of entire structure. Since conformation of backbone depends on

Classwork: Homology modeling. - Go to NCBI Entrez, search for gi 461699 Do Blast

Fold recognition. Unsolved problem: direct prediction of protein structure from the physico-chemical principles. Solved

Fold recognition. Goal: to find protein with known structure which best matches a given

Threading – method for structure prediction. Sequence-structure alignment, target sequence is compared to all

Protein structure prediction: target sequence is compared to structures using sequencestructure alignment Structural templates

Scoring function for threading. • Contact-based scoring function depends on amino acid types of

Scoring function for threading. Ala Trp Ile Tyr “w” is calculated from the frequency

Classwork: calculate the score for target sequence “ATPIIGGLPY” aligned to template structure which is

Evaluation of quality of structural model • Correct bond length and bond angles >>

Success and limitations of structure prediction Limitations: Success: • • Accuracy scores almost doubled

Gen. Threader http: //bioinf. cs. ucl. ac. uk/psipred. 1. Predicts secondary structures for target

Classwork. • Retrieve structures 1 GY 3, 1 E 9 H, 1 OL 2

Slides: 27

Download presentation

Protein structure prediction.

Protein folds. • Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity (topology). Sometimes a few SSEs may be missing. • Fold classification: structural similarity between folds is searched using structure-structure comparison algorithms.

Protein structure prediction flowchart Protein sequence Does sequence align with a protein of known structure? Database similarity search No Protein family analysis Yes Predicted threedimensional structural model Three-dimensional comparative modeling Yes Three-dimensional structural analysis in laboratory No Is there a predicted structure? Yes Relationship to known structure? No Structural analysis From D. W. Mount

Protein structure prediction. Prediction of three-dimensional structure of a protein from its sequence. Different approaches: - Homology modeling (query protein has a very close homolog in the structure database). - Fold recognition (query protein can be mapped to template protein with the existing fold). - Ab initio prediction (query protein has a new fold).

Homology modeling. Aims to produce protein models with accuracy close to experimental and is used for: - Protein structure prediction - Drug design - Prediction of functionally important sites (active or binding sites)

Steps of homology modeling. 1. 2. 3. 4. 5. Template recognition & initial alignment. Backbone generation. Loop modeling. Side-chain modeling. Model optimization.

1. Template recognition. Recognition of similarity between the target and template. Target – protein with unknown structure. Template – protein with known structure. Main difficulty – deciding which template to pick, multiple choices/template structures. Template structure can be found by searching for structures in PDB using pairwise sequence alignment methods.

Two zones of protein structure prediction. Sequence identity 100 Homology modeling zone 50 Fold recognition zone 50 100 150 200 Alignment length

2. Backbone generation. If alignment between target and template is ready, copy the backbone coordinates of those template residues that are aligned. If two aligned residues are the same, copy their side chain coordinates as well.

3. Insertions and deletions. insertion AHYATPTTT AH---TPSS deletion Occur mostly between secondary structures, in the loop regions. Loop conformations – difficult to predict. Approaches to loop modeling: - Knowledge-based: search the PDB for loops with known structures - Energy-based: an energy function is used to evaluate the quality of a loop. Energy minimization or Monte Carlo.

4. Side chain modeling. Side chain conformations – rotamers. In similar proteins side chains have similar conformations. If % identity is high - side chain conformations can be copied from template to target. If % identity is not very high modeling of side chains using libraries of rotamers and different rotamers are scored with energy functions. Problem: side chain configurations depend on backbone conformation which is predicted, not real E 2 E 3 E 1 E = min(E 1, E 2, E 3)

5. Model optimization. Energy optimization of entire structure. Since conformation of backbone depends on conformations of side chains and vice versa iteration approach: Predict rotamers Shift in backbone

Classwork: Homology modeling. - Go to NCBI Entrez, search for gi 461699 Do Blast search against PDB Repeat the same for gi 60494508 Compare the results

Fold recognition. Unsolved problem: direct prediction of protein structure from the physico-chemical principles. Solved problem: to recognize, which of known folds are similar to the fold of unknown protein. Fold recognition is based on observations/assumptions: - The overall number of different protein folds is limited (1000 -3000 folds) - The native protein structure is in its ground state (minimum energy)

Fold recognition. Goal: to find protein with known structure which best matches a given sequence. Since similarity between target and the closest template is not high, pairwise sequence alignment methods fail. Solution: threading – sequence-structure alignment method.

Threading – method for structure prediction. Sequence-structure alignment, target sequence is compared to all structural templates from the database. Requires: - Alignment method (dynamic programming, Monte Carlo, …) - Scoring function, which yields relative score for each alternative alignment

Protein structure prediction: target sequence is compared to structures using sequencestructure alignment Structural templates Score 1 Score 2 Target sequence Concept of threading: D. Jones et al, 1993 Score 3

Protein structure prediction: target sequence is compared to structures using sequencestructure alignment Structural templates Score 1 Score 2 Target sequence Score 3>Score 2>Score 1 Score 3 Structural model of target

Scoring function for threading. • Contact-based scoring function depends on amino acid types of two residues and distance between them. • Sequence-sequence alignment scoring function does not depend on the distance between two residues. • If distance between two nonadjacent residues in the template is less than 8 Å, these residues make a contact.

Scoring function for threading. Ala Trp Ile Tyr “w” is calculated from the frequency of amino acid contacts in PDB; ai – amino acid type of target sequence aligned with the position “i” of the template; N- number of contacts

Classwork: calculate the score for target sequence “ATPIIGGLPY” aligned to template structure which is defined by the contact matrix. A T 1 1 2 3 4 * 5 6 * 7 8 9 10 * Y 2 3 I G * * 6 7 * * * 8 * 9 10 T P Y I G L -0. 2 -0. 1 0. 5 -0. 2 0. 3 -0. 1 -0. 2 -0. 3 0. 1 0 -0. 2 -0. 4 -0. 1 -0. 2 -0. 4 -0. 2 -0. 1 -0. 2 0. 3 0. 2 0. 4 0. 2 * 4 5 P A * * L 0. 3

Evaluation of quality of structural model • Correct bond length and bond angles >> 3. 8 Angstroms • Correct placement of functionally important sites • Prediction of global topology, not partial alignment (minimum number of gaps)

Success and limitations of structure prediction Limitations: Success: • • Accuracy scores almost doubled from CASP 1 to CASP 6, might be because of database size Models of small targets are very accurate • • • Models of large and remotely related proteins are not very accurate Domain boundaries are difficult to define Models often do not provide details for functional annotation Adapted from Kryshtafovych et al 2005

Gen. Threader http: //bioinf. cs. ucl. ac. uk/psipred. 1. Predicts secondary structures for target sequence. 2. Makes sequence profiles (PSSMs) for each template sequence. 3. Uses threading scoring function to find the best matching profile.

Classwork. • Retrieve structures 1 GY 3, 1 E 9 H, 1 OL 2 • Examine all interactions within and between chains/domains.