Motifs and Domains Chemical Modifications Dr Hilal AY

  • Slides: 34
Download presentation
Motifs and Domains, Chemical Modifications Dr. Hilal AY

Motifs and Domains, Chemical Modifications Dr. Hilal AY

Protein motifs and domains • Protein motifs are small regions of protein threedimensional structure

Protein motifs and domains • Protein motifs are small regions of protein threedimensional structure or amino acid sequence shared among different proteins. • They are recognizable regions of protein structure that may (or may not) be defined by a unique chemical or biological function. • Frequently occurring folding patterns (motifs) can be found in many proteins. Fig. Structural elements from Pit-1 homeodomain (PDB: 1 AU 7) and Src (PDB: 1 FMK). Motifs are made up of secondary structure elements but do not necessarily make up a hydrophobic core. Domains are the smallest selfcontained unit within a structure. Structures may be made up of multiple domains, sometimes with repeats of the same domain.

Protein motifs and domains • Protein motifs can be relatively simple structures like the

Protein motifs and domains • Protein motifs can be relatively simple structures like the β-turn (two adjacent anti parallel βstrands), the omega loop (residues at the start and the end of the loop are close together) and a calcium-binding motif present in a number of proteins (e. g. , calmodulin) called the EF hand (αhelix turn α-helix), or more complex structures like the Greek key (four adjacent anti-parallel βstrands). Figure. Common Beta Strand Structural Motifs. (A) Right-handed Twisted Sheet Top and Side View, (B) Beta Barrel Side View, and (C) Beta Barrel Top View

 • A structural domain is an element of the protein's overall structure that

• A structural domain is an element of the protein's overall structure that is stable and often folds independently of the rest of the protein chain. • Many domains are not unique to the protein products of one gene, but instead appear in a variety of proteins. • Proteins sharing more than a few common domains are encoded by members of evolutionarily related genes comprising gene families. • Genes for proteins that share only one or a few domains may belong to gene superfamilies. • Superfamily members can have one function in common, but their sequences are otherwise unrelated.

Protein motifs and domains • Domain names often derive from their prominent biological function

Protein motifs and domains • Domain names often derive from their prominent biological function in the protein they belong to (e. g. , the calcium-binding domain of calmodulin), or from their discoverers (the PH domain!). • The domain swapping that gives rise to gene families and superfamilies are natural genetic events. Because protein domains can also be "swapped" by genetic engineering to make chimeric proteins with novel functions. Fig. Schematic diagram of the genomic structure of the 29. 3 kilobase 2019 novel coronavirus (n. Co. V) gene and domain structure of the 1273 amino acid spike glycoprotein S (not to scale). E, envelope protein gene; M, membrane protein gene; N, nucleocapsid protein gene; RBM, receptor-binding motif; Rd. RP, RNAdependent RNA polymerase; S, spike protein gene.

Protein motifs and domains • Domains (30– 400 amino acids) are stable, distinct folded

Protein motifs and domains • Domains (30– 400 amino acids) are stable, distinct folded areas in globular proteins sometimes comprising several motifs. • A protein may consist of several domains joined by short polypeptide chains, with each domain having a distinct function. • When a domain appears in a different protein it will retain its functionality. • This could be catalytic or involve protein–nucleic acid, protein–protein or protein-membrane interactions. Fig. Structures of interacting proteins or protein domains taken from the protein data bank. ( A ) Atox 1, the bait; ( B ) DNMT 1 with the first BAH domain (interaction partner to Atox 1) in red; ( C ) PPM 1 A with the three C-terminal helices (interacting with Atox 1) in red. ( D ) PRYSPRY domain of TRIM 72 (matching the same domain of TRIM 26 with which Atox 1 interacts); ( E ) The RNA recognition motif (RRM) of CPEB 4; and ( F ) A zinc finger domain, as found in several of the hit proteins.

Protein motifs and domains • Three major classes of domains can be recognized: •

Protein motifs and domains • Three major classes of domains can be recognized: • These are domains consisting of mainly α helices, domains containing mainly β strands and domains that are mixed by containing α and β elements. • In this last class are structures containing both alternating α/β secondary structures as well as proteins made up of collections of helices and strands (α + β). • Within each of these three groups there are many variations of the basic themes that lead to further classification of protein architectures.

Figure. The secondary structure elements found in monomeric proteins. The λ repressor protein (PDB:

Figure. The secondary structure elements found in monomeric proteins. The λ repressor protein (PDB: 1 LMB) contains the helix turn helix (HTH) motif; cytochrome b-562: (PDB: 256 b) is a four-helix bundle heme binding domain; human thioredoxin: (PDB: 1 ERU), a mixed α/β protein containing a fivestranded twisted β sheet. Spinach plastocyanin: (PDB: 1 AG 6) a single Greek key motif binds Cu (shown in green); human cis-trans proline isomerase (PDB: 1 VBS), a small extensive β domain containing a collection of strands that fold to form a ‘sandwich’; human γ-crystallin: (PDB: 2 GCR), two domains each of which is an eight-stranded β barrel type structure composed of two Greek key motifs

Protein motifs and domains • The β meander motif is a series of antiparallel

Protein motifs and domains • The β meander motif is a series of antiparallel β strands linked by a series of loops or turns. • In the β meander the order of strands across the sheet reflects their order of appearance along the polypeptide sequence.

Protein motifs and domains • A variation of this design is the so-called Greek

Protein motifs and domains • A variation of this design is the so-called Greek key motif, and it takes its name from the design found on many ancient forms of pottery or architecture. The Greek key motif links four antiparallel β strands with the third and fourth strands forming the outside of the sheet whilst strands 1 and 2 form on the inside or middle of the sheet. • The Greek key motif can contain many more strands ranging from 4 to 13. The Cu binding metalloprotein plastocyanin contains eight β strands arranged in a Greek key motif.

Protein motifs and domains • A β sandwich forms normally via the interaction of

Protein motifs and domains • A β sandwich forms normally via the interaction of strands at an angle and connected to each other via short loops. • In some cases these strands can originate from a different polypeptide chain but the emphasis is on two layers of β strands interacting together within a globular protein. • The layers of the sandwich can be aligned with respect to each other or arranged orthogonally. • An example of the second arrangement is shown in intestinal fatty acid-binding protein.

Protein motifs and domains • The Rossmann fold, named after its discoverer Michael Rossmann,

Protein motifs and domains • The Rossmann fold, named after its discoverer Michael Rossmann, is an important supersecondary structure element and is an extension of the β-α-β domain. • The Rossmann fold consists of three parallel β strands with two intervening α helices i. e. β-α-β. • These units are found together as a dimer – so the Rossmann fold contains six β strands and four helices – and this collection of secondary structure frequently forms a nucleotide-binding site.

Protein motifs and domains • Nucleotide binding domains are found in many enzymes and

Protein motifs and domains • Nucleotide binding domains are found in many enzymes and in particular, dehydrogenases, where the co-factor nicotinamide adenine dinucleotide is bound at an active site. • Examples of proteins or enzymes containing the Rossmann fold are lactate dehydrogenase, glyceraldehyde-3 -phosphate dehydrogenase, alcohol and dehydrogenase, malate dehydrogenase. • However, it is clear that this fold is found in other nucleotide proteins beside dehydrogenases, including glycogen phosphorylase glyceroltriphosphate binding proteins. and

Protein motifs and domains • Elements of super secondary structure are frequently used to

Protein motifs and domains • Elements of super secondary structure are frequently used to allow protein domains to be classified by their structures. • Most frequently these domains are identified by the presence of characteristic folds. • A fold represents the ‘core’ of a protein domain formed from a collection of secondary structures. • In many cases these folds occur in more than one protein allowing structural relationships to be established.

Protein motifs and domains • These characteristic folds include four-helix bundles (cytochrome b 562),

Protein motifs and domains • These characteristic folds include four-helix bundles (cytochrome b 562), helix turn helix motifs (the λ repressor), β barrels, and the β sandwich as well as more complicated structures such as the β propellor and β helix. • The β helix is an unusual arrangement of secondary structure – β strands align in a parallel manner one above another forming inter-strand hydrogen bonds but collectively twisting as a result of the displacement of successive strands. • The strands all run in the same direction and the displacement result in the formation of a helix.

Protein motifs and domains • Both left-handed and right-handed β helix proteins have been

Protein motifs and domains • Both left-handed and right-handed β helix proteins have been discovered, and a prominent example of a right handed β helix occurs in the tailspike protein of bacteriophage P 22. • In this protein the tailspike protein is actually a trimer containing three interacting β helices. • The arrangement of strands within a β helix gives any subunit with this structural motif a very elongated appearance and leads to the hydrophobic cores being spread out along the long axis as opposed to a typical globular packing arrangement. Figure. (a) The entire P 22 tailspike protein, shown bound to the nonasaccharide from S. enterica serovar 253 Ty O-antigen (in yellow space-filling representation). The N-terminal domain is at the top, and the three subunit chains are shown in red, green, and blue. (b) An interior hydrophobic stack from one of the three identical single-chain, parallel β-helices is shown with side chains highlighted in yellow. (c) Residues 540 to 569, viewed from above and showing inwardly pointing hydrophobic residues. This region, which spans the interdigitated domain, forms one turn of a triple-stranded β-helix and is involved in trimer stability.

 • An important aspect of biological sequence characterization is identification of motifs and

• An important aspect of biological sequence characterization is identification of motifs and domains. • It is an important way to characterize unknown protein functions because a newly obtained protein sequence often lacks significant similarity with database sequences of known functions over their entire length, which makes functional assignment difficult. • In this case, biologists can gain insight of the protein function based on identification of short consensus sequences related to known functions.

 • https: //www. jove. com/science-education/10679/protein-folding • https: //www. jove. com/science-education/10678/proteinorganization

• https: //www. jove. com/science-education/10679/protein-folding • https: //www. jove. com/science-education/10678/proteinorganization

Protein post-translational modifications • Many polypeptides undergo covalent modification, either during or after their

Protein post-translational modifications • Many polypeptides undergo covalent modification, either during or after their ribosomal synthesis, giving rise to the concept of co-translational and post-translational modification (generally simply called post-translational modification). • PTMs are characteristic particularly of eukaryotic proteins and are generally introduced by specific enzymes or enzyme pathways.

Protein post-translational modifications • Many occur at the site of a specific characteristic protein

Protein post-translational modifications • Many occur at the site of a specific characteristic protein sequence (signature sequence) within the protein backbone. • This has allowed the development of bioinformatic tools capable of identifying potential PTM sites along a protein’s backbone via sequence analysis.

Protein post-translational modifications • Moreover, many such PTMs will introduce a predefined mass difference

Protein post-translational modifications • Moreover, many such PTMs will introduce a predefined mass difference into the affected polypeptide. • Additional bioinformatic tools have thus been developed that can interrogate peptide mass fingerprinting (MS) data for mass differences between actual peptide fragments experimentally generated from a protein and theoretical peptides in databases in an effort to identify the occurrence of specific PTMs.

The more common forms of post-translational modifications that polypeptides may undergo.

The more common forms of post-translational modifications that polypeptides may undergo.

Glycosylation • Glycosylation (the attachment of carbohydrates) is one of the most common forms

Glycosylation • Glycosylation (the attachment of carbohydrates) is one of the most common forms of PTM associated with eukaryotic proteins, particularly extracellular and cell-surface proteins. • Limited protein glycosylation can also be undertaken by some bacteria. • Two types of glycosylation generally occur: N-linked and O-linked. • In the case of N-linked glycosylation, the sugar chain (the oligosaccharide) is attached to the protein via the nitrogen atom of an asparagine (Asn) residue, while in O-linked systems the sugar chain is attached to the oxygen atom of hydroxyl groups, usually those of serine or threonine residues.

Glycosylation • Glycosylation is the most complex PTM associated with native human proteins, and

Glycosylation • Glycosylation is the most complex PTM associated with native human proteins, and it is estimated that 1– 2% of the human genome encodes proteins contributing to glycosylation capacity.

Proteolytic processing • Proteolytic processing refers to limited and specific proteolytic cleavage of a

Proteolytic processing • Proteolytic processing refers to limited and specific proteolytic cleavage of a polypeptide subsequent to its synthesis. • Virtually all proteins, be they derived from prokaryotic or eukaryotic sources, destined for export from the cell contain a signal sequence at their N-terminal end. • The signal sequence itself is cleaved off as the protein translocates across the membrane, releasing the mature polypeptide. • Signal sequences are usually characterized by a positively charged N-terminal region, followed by a hydrophobic region and finally a neutral but polar region. • Various bioinformatic tools (which can be accessed via Ex. PASy) have been developed which can predict the presence of such signal sequences in sequence data.

Proteolytic processing • In addition to playing a role in protein targeting, proteolysis can

Proteolytic processing • In addition to playing a role in protein targeting, proteolysis can modulate the biological activity of many proteins. • The pre-cleaved (‘pro’) form of such proteins are generally inactive, with activation occurring on proteolysis. Examples include the mammalian digestive enzymes trypsin, chymotrypsin and pepsin. • These are initially synthesized and stored in the pancreas as ‘pro’ or ‘zymogen’ precursors. • Additional examples include a range of blood clotting factors and insulin. • Proteolytic activation is very specific and is generally irreversible.

Phosphorylation • Reversible phosphorylation represents yet another form of PTM, and is undertaken primarily

Phosphorylation • Reversible phosphorylation represents yet another form of PTM, and is undertaken primarily in eukaryotes but also in prokaryotes. • The phosphate group donor is most often ATP and phosphorylation/dephosphorylation of the target protein is undertaken by substrate-specific protein kinase and protein phosphatase enzymes. • The site of phosphorylation is usually the hydroxyl group of either serine, threonine or tyrosine residues, although the side chains of aspartate, lysine and histidine can also sometimes be phosphorylated. • Again the exact site phosphorylated exhibits both a sequence-specific characteristic and a likely requirement for a characteristic threedimensional shape.

Phosphorylation • In the vast majority of cases phosphorylation directly affects the biological activity

Phosphorylation • In the vast majority of cases phosphorylation directly affects the biological activity of the target protein, with phosphorylation/dephosphorylation functioning as a reversible on/off switch. • In some cases (e. g. the enzyme glycogen phosphorylase), phosphorylation results in activation, whereas in other cases (e. g. the enzyme glycogen synthase), phosphorylation results in inactivation. • Protein phosphorylation events are known to regulate a wide variety of cellular processes, including metabolism, transcription, translation and protein degradation, as well as cellular differentiation, signalling and proliferation. • In a few instances, phosphorylation does not play a regulatory role, for example the phosphorylation of the milk protein casein is of nutritional rather than functional importance.

Acetylation, acylation and amidation • Acetylation (the addition of an acetyl group, CH 3

Acetylation, acylation and amidation • Acetylation (the addition of an acetyl group, CH 3 CO) is the most common PTM associated with eukaryotic proteins. • Acetylation of the N-terminus is characteristic of approximately 50% of cytoplasmic proteins in yeast and over 80% of human cytoplasmic proteins. • The acetyl group donor is usually acetyl-Co. A, and the reaction is catalysed by N-acetyltransferase enzymes. • In some cases N-terminal acetylation appears to occur before the polypeptide is completely synthesized, while in other instances acetylation occurs post-translationally. • N-terminal acetylation is also characteristic of some prokaryotic and archael proteins.

Acetylation, acylation and amidation • Protein acylation refers to the direct covalent attachment of

Acetylation, acylation and amidation • Protein acylation refers to the direct covalent attachment of fatty acids to a polypeptide backbone. • The fatty acids most commonly found in association with acylated polypeptides are the 16 -carbon saturated palmitic acid and the 14 -carbon saturated myristic acid. • Palmitic acid is usually covalently linked via an ester or thioester bond to either a cysteine, serine or threonine residue, while myristic acid is invariably covalently attached to an N-terminal glycine residue via an amide bond. • Acylated polypeptides appear to be ubiquitous in eukaryotes and are also found in many viruses.

Acetylation, acylation and amidation • Amidation refers to the replacement of a protein’s C-terminal

Acetylation, acylation and amidation • Amidation refers to the replacement of a protein’s C-terminal carboxyl group with an amide group. • It is a PTM more characteristic of peptides as opposed to polypeptides, and amidated peptides such as oxytocin, vasopressin and calcitonin are used therapeutically. • Despite its relatively widespread occurrence, in higher eukaryotes in particular, its exact biological functions remain less than precisely refined. It may contribute to peptide stability, activity or both. • It appears to be an important determinant in the binding of several regulatory peptides to their corresponding receptors.

Sulfation • Sulfation is a PTM that entails the attachment of a sulfate (SO

Sulfation • Sulfation is a PTM that entails the attachment of a sulfate (SO 3–) group to the protein backbone, usually via target tyrosine residues, although it can sometimes occur via serine and threonine residues. • Sulfation is undertaken mainly in higher eukaryotes. It is a process mediated by sulfotransferases in the Golgi network and it is predominantly associated with secretory and membrane proteins. • Functionally, sulfation often plays a role in protein– protein interactions. Generally, the absence of sulfation tends to reduce rather than abolish activity. Figure. Protein tyrosine sulfation (PTS) and its biological path. The drawing depicts biochemical processes of sulfate in cell, from the activation of inorganic sulfate, its integration into protein and its effects on protein interaction that induces physiological and pathogenic responses.

Sulfation • Many chemokine and hormone cell-surface receptors are sulfated. • This infers a

Sulfation • Many chemokine and hormone cell-surface receptors are sulfated. • This infers a possible role for this PTM in cellular processes, including immunity, haematopoiesis and angiogenesis. • Sulfation may also play an important functional role in the docking of animal viruses to their target cells. Figure. Tyrosine sulfation plays an important role in the immune response. (a) Leukocytes roll upon, adhere to and transmigrate between endothelial cells at sites of inflammation. P-selectin and its ligand, PSGL-1, are often required for this process. (b) PSGL-1 is a mucin-like glycoprotein that appears to be an extended rod shape in vivo. The extreme amino terminus of PSGL-1 carries three tyrosine sulfation sites, shown in yellow. These sulfate esters, and specific glycans on PSGL-1, are key binding determinants for Pselectin.

Protein modifications in bacteria https: //www. nature. com/articles/s 41579 -019 -0243 -0

Protein modifications in bacteria https: //www. nature. com/articles/s 41579 -019 -0243 -0