Protein Domains Week 8 Syllabus Syllabus Working with
Protein Domains Week 8
Syllabus
Syllabus
Working with Proteins Introduction to Proteins: 1) Amino Acid Sequence: primary structure 2) Motifs and Domains— 3 D structure
Re-cap of Last Class 1) Identified conservative and non-conservative amino acid substitutions 2) Learned the difference between percent identity and percent similarity. 3) Learned the concept of homology and the difference between orthologs and paralogs. 4) Identified protein domains in a BLASTp output 5) Used a substitution matrix to determine protein alignment scores
Today 1) Protein domains/protein families 2) 3 D protein structure 3) Structure and function
For Independent Project: Protein Structure 1) Identification of domains (CDD) 2) Domain specific multiple sequence alignment (CDD)
Learning Objectives 1) Identify protein domains through analysis of multiple sequence alignments. 2) Use Clustal Omega to generate a multiple sequence alignment 3) Understand the relationship between protein domains and protein families. 4) Understand the relationship between protein structure and protein function. 5) Understand how mutations that affect protein structure affect protein function and lead to phenotypes/diseases.
Proteins perform specific functions within cells Insulin is a signaling protein
Transcription factors bind to DNA Transcription Factors
Transcription factors have DNA binding domains Protein domains are distinct functional and structural units of a protein Having a DNA binding domain allows proteins to bind to DNA molecules All Proteins that bind DNA must have a DNA binding domain Domains share sequence and structural similarity
Bioinformatics tools can help us to identify protein domains Protein domains are distinct functional and structural units of a protein Having a DNA binding domain allows proteins to bind to DNA molecules All Proteins that bind DNA must have a DNA binding domain Domains share sequence and structural similarity
NCBI Protein
NCBI Protein links to BLAST
NCBI Protein links to the Conserved Domain Database
NCBI Protein links to BLAST
Protein BLAST reveals conserved protein domains Information from Conserved Domain Database (CDD) Domain Protein Family Short segments with alignment
Local sequence alignments identify domains within proteins Short segments with alignment
Protein domains are distinct functional and structural units of a protein Multiple proteins share the same domain. These proteins comprise a protein family. Short segments with alignment
NCBI has a Conserved Domain Database
Protein entry in Conserved Domain Database (CDD)
This is the domain structure figure for your protein that needs to be included in your project
Reminder: Include Citations for the presentation and report.
Correct Format of this Citation Marchler-Bauer, A. et al. CDD: NCBI’s conserved domain database. Nucleic Acids Res. 43, D 222 -226 (2015).
Protein entry in Conserved Domain Database (CDD)
CDD is a domain database (subunit of proteins) Summary and structure if available Entry for the homeodomain not the protein Hox. D 13
CDD is a domain database (subunit of proteins)
Domains have sequence homology
Multiple sequence alignment of domain residues Residues in red are conserved Residues in blue are divergent Uppercase residues are aligned Lowercase are unaligned -- indicates variation in sequence length # indicates unique feature/motif within the domain
Multiple sequence alignment of domain residues This alignment needs to be included in your independent project: Residues in red are conserved Residues in blue are divergent Uppercase residues are aligned Lowercase are unaligned -- indicates variation in sequence length # indicates unique feature/motif within the domain You must identify the sequences and the species they come from!
Highlighted residues represent motifs within the domain Residues in red are conserved Residues in blue are divergent Uppercase residues are aligned Lowercase are unaligned -- indicates variation in sequence length # indicates unique feature/motif within the domain
Highlighted residues represent motifs within the domain
For the project you will need this multiple sequence alignment Use your sequence and the other sequences provided:
For the project you will need this multiple sequence alignment
NCBI Protein
Using BLAST to identify protein domains
Using BLAST to identify protein domains
Using BLAST to identify protein domains
Using BLAST to identify protein domains
Create a global multiple sequence alignment using CLUSTAL Omega http: //www. ebi. ac. uk/Tools/msa/clustalo/
Create a global multiple sequence alignment using CLUSTAL Omega
Create a global multiple sequence alignment using CLUSTAL Omega 1) Must use FASTA file format
2) Must use Clustal with numbers
indicates consensus line: whether the positions share similarity
indicates consensus line: whether the positions share similarity Area of significant similarity
How to interpret a Clustal Omega alignment
How to interpret a Clustal Omega alignment
The BLOSUM 62 Scoring Matrix
How to interpret a Clustal Omega alignment
The BLOSUM 62 Scoring Matrix
Identification of conserved residues is based on the substitution scoring matrix
Domains are found in conserved regions of proteins
Domains are found in conserved regions of proteins
Using BLAST to identify protein domains
Domains are regions of a protein that share significant structural features or sequence identity with other proteins Extending along the length of a protein Domain Occupying a subset of a protein sequence Domain Occurring one or more times 2 Domains Bioinformatics and Functional Genomics, 2 nd Edition. http: //www. bioinfbook. org (2014). 3 Domains
Domains share structural similarity
3 D structures are determined by X-ray crystallography and NMR (nuclear magnetic resonance) Crystallizing the protein can be extremely difficult Computer analysis is used to determine protein structure from diffraction pattern Lodish, H. et al. Molecular Cell Biology (New York; W. H. Freeman, 2000).
3 D structures are determined by X-ray crystallography and NMR (nuclear magnetic resonance) A form of spectrometry Can only be used for domains or small proteins (> 350 amino acids) Protein does not have to be crystallized-is in solution http: //en. wikipedia. org/wiki/Nuclear_magnetic_resonance_spectroscopy
PDB can be used to search for “solved” structures (published)
Homeodomain consists of three alpha helices
PDB displays the primary and secondary protein structure
PDB displays the primary and secondary protein structure
Homeodomain consists of three alpha helices
The structure of the Homeodomain allows it to contact DNA
Synpolydactyly is caused by mutations that disrupt the ability of HOXD 13 to bind to target DNA Gilbert, S. F. Developmental Biology (Sunderland; Sinauer Associates, 2000)
Sequence Structure Function
When protein’s function is disrupted, we get phenotypes Gilbert, S. F. Developmental Biology (Sunderland; Sinauer Associates, 2000)
Summary 1) Domains are composed of conserved amino acid residues and can by defined through multiple sequence alignments. 2) Proteins that share domains are found in the same protein family. 3) The Conserved Domain Database contains protein domain information and corresponding sequence alignments. 4) Protein Data Bank website contains 3 D proteins structures identified by X-ray crystallography and NMR. 5) Protein structures are responsible for the ability of proteins to perform specific functions. 6) Mutations that disrupt protein structure can affect protein function, leading to phenotypes/diseases.
Worksheet
- Slides: 69