Introduction to Bioinformatics Topic 1 Introduction to Bioinformatics

Introduction to Bioinformatics Topic 1 Introduction to Bioinformatics and Sequence Analysis

Session 1 Learning Outcomes: • The scope of bioinformatics • The origins & growth of DNA databases • Evidence of evolution from bioinformatics • Example sequence analysis and displays using human Factor IX

Bioinformatics: Concerns the generation, visualization, analysis, storage, and retrieval of large quantities of biological information.

Gen. Bank growth: How much data we are talking about? -The amount of DNA sequence data in public databases NCBI: US national centre for Biotechnology Information DDJB: DNA Data Bank Japan EBI: European Bioinformatics Institute The contest of these data base are synchronized.

What DATA? ? ? • Human Genome Project • Projecting now come from scientists in numerous field of biology, medicine, agriculture, ecology, history, energy, and forensic. Lets give some examples which you can explore in your own interest:

http: //www. 1000 genomes. org The genomes of 1000 people to identify genetic variants that affect 1% of the human population

www. 1001 genomes. org The genomes of 1001 strains that differ in phenotype including adaptation to growth in a wide variety conditions.

https: //genome 10 k. soe. ucsc. edu/ An effort to sequence the genomes of 10, 000 species, one from each genus.

http: //www. arthropodgenomes. org/wiki/i 5 K

http: //www. ncbi. nlm. nih. gov/genome/browse/ Metagenomics database

Cancer genome atlas


ANNOTATION: The information describing genetic and protein sequences structures, similarities, functions, and prediction associated with these sequences.

WITNESSING EVOLUTION THROUGH BIOINFORMATICS Random mutation in sequences is a common phenomenon. Advantageous Deleterious Neutral Organism kept it for future population Quickly eliminated from the population May or may not be retained

Recent evolutionary changes to plants & animals 10, 000 years ago hunter-gather life-style to practicing agriculture. Domestication of animals. Cows milk production Horses speed or strength Sheep wool quantity and quantity Poultry more breast meat Fish speed of maturation

LARGE SOURCES OF HUMAN SEQUENCE VARIATION First time sequencing of human genome both cost and time was high. Resequencing cost decline sharply as using the first sequence as template. Resequencing show considerable differnces seen between individual people.

Single Nucleotide polymorphisms (SNPs): Human genome 3. 2 billions bp Approximately 3 million nucleotides differ between two individual genomes The common differences are found in about 1% of the population.

Copy Number Variations (CNVs): Comparing your DNA sequence to that of the human “standard genome”, there are thousands of DNA segments which range from 1000 to several million nucleotides in length and they are either present, present in multiple copies or absent from your genome.

RECENT EVOLUTIONARY CHANGES TO HUMAN POPULATIONS Africa (50, 000 years ago) Middle East Europe Neanderthals Eastern Europe Lithuania

Examples of genetic changes associated with adaptation (diet and lifestyle): Skin Color: African Indian Near equator Darker skin color block damaging of uv light Southern European Northern European Near pole Paler skin color make vitamin D Sequence variation in number of genes, one of it is SLC 24 A 5

Other examples: (self study) • Lactose intolerance • Digestion of starch • Malaria resistance and sickle cell anemia • Life at high altitude

DNA SEQUENCE IN DATABASES


Two types of DNA sequences are available in databases: 1. Genomic DNA 2. c. DNA

Genomic DNA assembly




c. DNA:

SEQUENCE ANALYSIS AND DATABASE DISPLAY The sequence of the m. RNA for human Factor IX Accession number: NM_000133

Applying two rules for describing the human Factor IX m. RNA sequence: i) ii) Coding regions begins with ATG Coding regions end with one of three terminator sequences: TAA TGA TAG

Coding regions are read at triplets. Others are 5’ and 3’ UTR


Coding region triplets are translated into amino acids.

The protein sequence of human factor IX (461 amino acids)

Pairwise alignment: Factor IX gene which is over 38000 nt. A single mutation, changing a G to T at coordinate 25531, results in hemophilia B, a severe bleeding disorder.

Alignment of human (Query) and chimpanzee (Subject/Subjct) Factor IX proteins

Factor IX has five major domains Cleaved by signal peptidase, 12 Gla residues in the second domain. To direct the protein to the ER of liver cells, from where it secreted into the blood. Activated by cleaving the protein into 2 peptide Epidermal growth factor- like domain bind Ca++ Cleave X protein, clotting cascade pathway

The entire 38000 nt gene is shown as the black arrow F 9.

Location of Factor IX gene in chr X.

THE END
- Slides: 41