8 Protein Design 1 Protein design Structure prediction

Protein design • Structure prediction and design are inverse problems 2

Protein design – why? • Industrial applications – Design of thermostable/super-soluble proteins (e. g.

Protein Design - Overview Protein design: Design a sequence that fits to a given

Design a sequence that fits a given structure ATCSFFGRKL. . • Assumption: design can

Protein design – an extension of side chain modeling • Given: – protein backbone

Protein design – an extension of side chain modeling Sampling • Same techniques used

Fold-tree formulation of design • Filled colored circles - flexible sc 8 o empty

1. Recover sequence profiles with protein design (Rosetta. Design) • Native protein sequences are

Rosetta. Design: Basic approach Evaluate best sequence for given backbone using: • amino acids:

Computational Experiment Dataset: 108 sequences with • solved structure (res ≤ 3Å) • ≤

Sequence conservation in designed sequences correlates with conservation in protein families (core residues) 12

Analysis of SH 3 domain designed sequences Input: • 400 known sequences of SH

Amino acid profiles for six core residues in SH 3 domains Shaded – designed

Redesign – conclusions • The templates retain information about their specific sequence • use

First complete redesign of protein: Zn Finger that folds without Zn (Dahyiat & Mayo,

ORBIT Protocol for Zn finger design (Dahyiat & Mayo, 1997) • Define 3 types

Design of a sequence that adopts a zinc finger fold without zinc Dahiyat &

FSD-1 (Full Sequence Design) (Dahyiat & Mayo, 1997) • Similarity to zif 268: •

2. Design new protein from scratch ATCSFFGRKL. . 20

TOP 7 – Design of a new fold Kuhlman, Dantas, … & Baker Science,

Creation scheme of TOP 7 1. Derive constraints from sketch 2. Build backbones that

Assessment of Design (1) Structure • 1. 17Å backbone rmsd • highly accurate! (2)

TOP 7 • No sequence memory → more stringent test of force field and

Folding of TOP 7 Watters et al. Cell 2007 Measure folding pathway using stopped-ﬂow

Follow-up: Folding rules for the design of ideal structures Goal: Define basic rules that

Follow-up: Folding rules for the design of ideal structures Rosetta simulations and statistical analysis

De novo design of 5 folds using these rules Ferredoxin-like Rossmann 2 x 2

De novo design of 5 folds using these rules Accuracy compared to NMR Ferredoxin-like

3. Stabilize protein orthogonal sources as constraints and guides in design Assist design by

orthogonal sources as constraints and guides in design 1. Scan natural sequence diversity (Blast)

PROSS: use orthogonal sources as constraints and guides 34 Goldenzweig, . . Fleishman (2016)

3. Design of a novel enzyme • Goal: design artificial enzymes that catalyze unnatural

Design of a novel enzyme Roethlisberger et al. 2008; Liang et al. , 2008

Model transition state • Kemp Elimination • Water mediated Model transition state 37

Search for Template • Inside-out: – Build inverse rotamer tree starting from catalytic site

Validation 2: accurate structure prediction 41

Kumamax: a drug for celiac patients • Goal: design an Oral Enzymatic Therapeutic that

Kumamax: a drug for celiac patients Identify protease: 1. active in acidic conditions (MEROPS

Kumamax: a drug for celiac patients Redesign active site: change sequence specificity (PR-> PQ)

Kumamax: a drug for celiac patients • Kumamax is active and specific for PQ

Kumamax: a drug for celiac patients Kuma 030 further improved variant, based on solved

4. Multiple state design – design switches • Design of a protein that fits

Design of protein switches (a) parallelantiparallel helices (oxidation dependent) (b) trimeric coiledcoil – zn

5. Negative design • Problem: optimization for a given fold does not guarantee that

Design of Homo-dimeric coiled-coils (Havranek & Harbury NSB 2003) Negative design against hetero-dimer Sequence

Slides: 48

Download presentation

8. Protein Design 1

Protein design • Structure prediction and design are inverse problems 2

Protein design – why? • Industrial applications – Design of thermostable/super-soluble proteins (e. g. Dantas Kuhlman & Baker 2003; Malakauskas & Mayo, 1998) • Improve fold prediction – Enrich PSSM of a fold family using designed sequences (e. g. Koehl & Levitt 1999; Kuhlman & Baker 2000) • Identify functional sites – Positions conserved in native but not designed sequences (e. g. Cheng, Samudrala & Baker 2004) • Crucial step before functional design (e. g. Liang & Baker, 2008; Ashworth & Baker 2007, etc) 3

Protein Design - Overview Protein design: Design a sequence that fits to a given structure 1. Recover sequence profiles (redesign) 2. Design new protein fold 3. Enzyme design 4. Multiple state design 5. Negative design 6. Interface design 4

Design a sequence that fits a given structure ATCSFFGRKL. . • Assumption: design can retrieve the sequences that fit to a given structure – can retrieve sequences that occur in nature for this protein fold 5

Protein design – an extension of side chain modeling • Given: – protein backbone – for each residue: set of possible conformations (rotamers from library) for different amino acids i i+1 i+2 • Wanted: Combination of rotamers that results in lowest total energy GMEC = min (SEir + SEirjs) Self energy Pair energy i i+1 i+2 GMEC defines designed sequence 6

Protein design – an extension of side chain modeling Sampling • Same techniques used as in side chain modeling, e. g. – DEE – MC, etc i i+1 i+2 Scoring • Add term that reflects amino acid preference Eaa optimized for recovery of aa frequencies in natural sequences i i+1 i+2 7

Fold-tree formulation of design • Filled colored circles - flexible sc 8 o empty colored circles – flexible amino acid: design

1. Recover sequence profiles with protein design (Rosetta. Design) • Native protein sequences are close to optimal for their structures (Kuhlman & Baker 2000) 9

Rosetta. Design: Basic approach Evaluate best sequence for given backbone using: • amino acids: 19 aa (Cys excluded) • side chains: Rotamer library • scaffold: Constant backbone coordinates Location of (G)MEC: Monte Carlo procedure for aa and rotamer assignment Energy function: additional term: Eaa • reference energy to reproduce amino acid frequencies observed in native proteins • trained (optimized) on a set of 30 proteins 10

Computational Experiment Dataset: 108 sequences with • solved structure (res ≤ 3Å) • ≤ 30% sequence identity Results: • 51% of the core residues in the designed sequence were identical to the naturally occurring residues • 27% of all residues in the designed sequence were Identical to the naturally occurring residues 11

Sequence conservation in designed sequences correlates with conservation in protein families (core residues) 12

Analysis of SH 3 domain designed sequences Input: • 400 known sequences of SH 3 domains • 11 crystal structures Procedure: • design 1000 sequences for each structure • derive amino acid profile of the 11000 sequences • compared to profile of the native sequences Results: ügood match between profiles üevolution has sampled most of the sequence space compatible SH 3 domain üequilibrium reached 13

Amino acid profiles for six core residues in SH 3 domains Shaded – designed Empty - natural 14

Redesign – conclusions • The templates retain information about their specific sequence • use a set of different templates to improve sequence profile recovery • add backbone sampling to improve profile recovery • Energy function is adequate to reproduce a large fraction of the naturally occurring amino acids of a given fold • Low-energy sequences are close to native (not necessarily the GMEC) • Stability is the major constraint in evolution of core residues Native protein sequences are close to optimal to their structures 15

First complete redesign of protein: Zn Finger that folds without Zn (Dahyiat & Mayo, 1997) 16

ORBIT Protocol for Zn finger design (Dahyiat & Mayo, 1997) • Define 3 types of residue classes 1) Exposed Class 2) Core Class 3) Boundary Class (in between) • Divide positions in the target template to classes 1) Exposed: 2) Buried: 3) Boundary: allowed only: A, S, T, H, D, N, E, Q, K, R A, V, L, I, F, Y, W Combined (1) & (2) • Use rotamer library & DEE algorithm to locate GMEC • Find additional lowenergy solutions by local sampling starting from GMEC Ø 1. 9 X 1027 combinations 17

Design of a sequence that adopts a zinc finger fold without zinc Dahiyat & Mayo (1997) • Local sampling starting from GMEC reveals conservation pattern of designs Alignment with zif 268 second finger Ranking of predicted sequences Conservation across 1000 simulations 18

FSD-1 (Full Sequence Design) (Dahyiat & Mayo, 1997) • Similarity to zif 268: • 6/28 identical • 11/28 similar • NMR: RMSD within 2Å • Core: additional hydrophobic aa fill space vacated by removal of metal-binding site • C/H replaced with F/A/K • Helix: • stabilized by N-capping interactions • No similar sequences found by BLAST 19

2. Design new protein from scratch ATCSFFGRKL. . 20

TOP 7 – Design of a new fold Kuhlman, Dantas, … & Baker Science, 2003 1. Define new scaffold not observed in Nature 2. Find sequence that will fold into scaffold Iterate between Structure prediction (with fixed sequence) and Sequence design (with fixed structure) 21

Creation scheme of TOP 7 1. Derive constraints from sketch 2. Build backbones that fulfill constraints (150) 3. Design sequences for backbone templates • all aa at 71/93 positions • polar only for surface positions 4. Optimize backbone conformations • Initial perturbation, followed by • side chain optimization, and • backbone torsion angle minimization 5 x 15 cycles 22

Assessment of Design (1) Structure • 1. 17Å backbone rmsd • highly accurate! (2) Stability • stable at 980 C! • stable at ~5 M Gu-Hcl! Blue: model; Red: xray 23

TOP 7 • No sequence memory → more stringent test of force field and minimization procedure • Optimized steric packing prevents molten globules • No similarity to natural sequences (psiblast) → What can we learn from a protein that did not undergo natural selection? ? 24

Folding of TOP 7 Watters et al. Cell 2007 Measure folding pathway using stopped-ﬂow kinetics, circular dichroism, and NMR experiments • No evolutionary pressure acting on the folding free energy landscapes • ± 3 distinct folding phases • Nonnative conformation is stable at equilibrium • Multiple stable fragments èFolding of Top 7 is less cooperative than native proteins èCooperative folding & smooth free energy landscapes are not general properties folding proteins, but a product of 25 natural selection

Follow-up: Folding rules for the design of ideal structures Goal: Define basic rules that govern simple tertiary motifs, and in turn, more complex structures (independent on amino acids in sequence, only on segment lengths) Koga N , Koga R, … & Baker, Nature 2012; Hoecker B, Nature 2012 26

Follow-up: Folding rules for the design of ideal structures Rosetta simulations and statistical analysis (sequenceindependent) -> dependence orientation on loop length (1) bb rule of Koga N , Koga R, … & Baker, Nature 2012; Hoecker B, Nature 2012 27

De novo design of 5 folds using these rules Ferredoxin-like Rossmann 2 x 2 IF 3 -like P-loop 2 x 2 Rossmann 3 x 1 30 Koga N , Koga R, … & Baker, Nature 2012; Hoecker B, Nature 2012

De novo design of 5 folds using these rules Accuracy compared to NMR Ferredoxin-like Rossmann 2 x 2 IF 3 -like P-loop 2 x 2 Rossmann 3 x 1 Koga N , Koga R, … & Baker, Nature 2012; Hoecker B, Nature 2012

3. Stabilize protein orthogonal sources as constraints and guides in design Assist design by focusing on relevant sequences (remove false positives): • Natural sequences Minimum (Rosetta) Energy collected from sequence databases, e. g. uniprot, metagenomes, etc MSAs • in vitro evolution deep scanning mutagenesis experiments positional tolerance: which amino acids are tolerated at single positions Natural sequences Design protein In vitro scanning mutagenesis 32

orthogonal sources as constraints and guides in design 1. Scan natural sequence diversity (Blast) 2. (optional): define tolerated aas with large-scale mutagenesis (experiment) 3. Choose best combination among naturally sampled Rosetta amino acids (computational design) Design 4. Test design for expression and stability protein PROSS (Fleishman lab): very successful For improved expression and stability Natural sequences mutagenesis 33 Goldenzweig, . . Fleishman (2016) Molecular Cell, 63: 337 http: //pross. weizmann. ac. il/

PROSS: use orthogonal sources as constraints and guides 34 Goldenzweig, . . Fleishman (2016) Molecular Cell, 63: 337 http: //pross. weizmann. ac. il/

3. Design of a novel enzyme • Goal: design artificial enzymes that catalyze unnatural reactions • Enzymes: – lower the activation barrier, by – stabilizing transition state – shielding reactants Roethlisberger et al. 2008; Liang et al. , 2008 35

Design of a novel enzyme Roethlisberger et al. 2008; Liang et al. , 2008 Approach: 1. Model transition state of reaction (QM) 2. Stabilize with carefully placed chemical groups around it 3. Graft resulting active site into an existing protein 4. Alter the sequence of the protein to accommodate the active site 36

Model transition state • Kemp Elimination • Water mediated Model transition state 37

Search for Template • Inside-out: – Build inverse rotamer tree starting from catalytic site – Search for fitting backbone templates (geometric hashing) • Rosetta. Match: Outsidein: – Place side chains and transition state model at each position – Search for transition state model orientations that fit several positions 38

Find match 39

Validation 1: enzyme is active 40

Validation 2: accurate structure prediction 41

Kumamax: a drug for celiac patients • Goal: design an Oral Enzymatic Therapeutic that removes immunogenic oligopeptides: • • Specific Optimal activity at gastric p. H Protease resistant Easy to produce 42 Gordon … Siegel. 2012; Wolf… Pultz. 2015 JACS

Kumamax: a drug for celiac patients Identify protease: 1. active in acidic conditions (MEROPS database) 2. solved structure available Kumamolisin: ü S-E-D catalytic triad ü stable and active ü recognizes dipeptide-> specific 43 Gordon … Siegel. 2012; Wolf… Pultz. 2015 JACS

Kumamax: a drug for celiac patients Redesign active site: change sequence specificity (PR-> PQ) using foldit • Remove negative charge • Fill pocket • Generate Hbonds Test 261 designs (~50% successful) & select best: Kumamax V 119 D, S 262 K, N 291 D, D 293 T, G 319 S, D 358 G, D 368 H 44 Gordon … Siegel. 2012; Wolf… Pultz. 2015 JACS

Kumamax: a drug for celiac patients • Kumamax is active and specific for PQ (not PR, QP) X 100 more active X 800 more specific 95% gliadin degraded • Kumamax is resistant to proteases 45 Gordon … Siegel. 2012; Wolf… Pultz. 2015 JACS

Kumamax: a drug for celiac patients Kuma 030 further improved variant, based on solved structure of Kumamax: K 262 E, E 269 T, S 354 Q, G 358 S, D 399 Q, A 449 Q • efficiently degrades gluten in complex food matrices • digested peptides do not elicit immune response anymore 46 Gordon … Siegel. 2012; Wolf… Pultz. 2015 JACS

4. Multiple state design – design switches • Design of a protein that fits two different conformations – Manipulate equilibrium between conformations (e. g. metal binding, phosphorylation, etc) • Design protein that binds two (or more) different partners – Improve sequence recovery by addressing several constraints 47

Design of protein switches (a) parallelantiparallel helices (oxidation dependent) (b) trimeric coiledcoil – zn finger (metal dependent) (c) homeo-domain – zn finger (metal dependent) Ambroggio & Kuhlman COSB 2006 48

5. Negative design • Problem: optimization for a given fold does not guarantee that other alternative folds are not favorable for a sequence • Solubility: prevent aggregation • Compactness: prevent molten globule states • Specificity: Negative design prevents alternative conformations 49

Design of Homo-dimeric coiled-coils (Havranek & Harbury NSB 2003) Negative design against hetero-dimer Sequence 2 is better than Sequence 1: specific, even though higher in energy 50