Genomics Chapter 18 Mapping Genomes Maps of genomes

  • Slides: 51
Download presentation
Genomics Chapter 18

Genomics Chapter 18

Mapping Genomes Maps of genomes can be divided into 2 types -Genetic maps -Abstract

Mapping Genomes Maps of genomes can be divided into 2 types -Genetic maps -Abstract maps that place the relative location of genes on chromosomes based on recombination frequency -Physical maps -Use landmarks within DNA sequences, ranging from restriction sites to the actual DNA sequence 2

Physical Maps Distances between “landmarks” are measured in base-pairs -1000 basepairs (bp) = 1

Physical Maps Distances between “landmarks” are measured in base-pairs -1000 basepairs (bp) = 1 kilobase (kb) Knowledge of DNA sequence is not necessary There are three main types of physical maps -Restriction maps -Cytological maps -Radiation hybrid maps 3

Physical Maps Restriction maps -The first physical maps -Based on distances between restriction sites

Physical Maps Restriction maps -The first physical maps -Based on distances between restriction sites -Overlap between smaller segments can be used to assemble them into a contig -Continuous segment of the genome 4

Copyright © The Mc. Graw-Hill Companies, Inc. Permission required for reproduction or display. +B

Copyright © The Mc. Graw-Hill Companies, Inc. Permission required for reproduction or display. +B ym en z e. A enzyme B DNA m zy of a segment of DNA are cut with restriction enzymes. en 1. Multiple copies Molecular weight marker 2. The fragments produced by enzyme A only, by enzyme B only, and by enzymes A and B together are run side-by-side on a gel, which separates them according to size. 3. The fragments are arranged so that the smaller ones produced by the simultaneous cut can be grouped to generate the larger ones produced by the individual enzymes. 4. A physical map is constructed. 14 kb 10 kb 9 kb 8 kb 9 kb 6 kb 5 kb 2 kb A 5 kb 3 kb 2 kb 8 kb 9 kb A 5 kb 14 kb B 2 kb 3 kb A B 0 2 kb 5 kb 9 kb A A 10 kb 5 19 kb

Physical Maps Cytological maps -Employ stains that generate reproducible patterns of bands on the

Physical Maps Cytological maps -Employ stains that generate reproducible patterns of bands on the chromosomes -Divide chromosomes into subregions -Provide a map of the whole genome, but at low resolution -Cloned DNA is correlated with map using fluorescent in situ hybridization (FISH) 6

Physical Maps 7

Physical Maps 7

Physical Maps Radiation hybrid maps -Use radiation to fragment chromosomes randomly -Fragments are then

Physical Maps Radiation hybrid maps -Use radiation to fragment chromosomes randomly -Fragments are then recovered by fusing irradiated cell to another cell -Usually a rodent cell -Fragments can be identified based on banding patterns or FISH 8

Physical Maps Sequence-tagged sites -An STS is a small stretch of DNA that is

Physical Maps Sequence-tagged sites -An STS is a small stretch of DNA that is unique in the genome -Only 200 -500 bp -Boundary is defined by PCR primers -Identified using any DNA as a template -STSs essentially provide a scaffold for assembling genome sequences 9

Copyright © The Mc. Graw-Hill Companies, Inc. Permission required for reproduction or display. STS

Copyright © The Mc. Graw-Hill Companies, Inc. Permission required for reproduction or display. STS sites STS 1 STS 2 STS 3 STS 4 DNA PCR primers 1. The location of 4 STSs in the genome is shown. PCR is used to amplify each STS from different clones in a library. Amplifying each STS by PCR generates a unique fragment that can be identified. PCR runs with four clones Clone A Clone B Clone C Clone D Longer fragments STS 4 STS 3 STS 2 STS 1 Shorter fragments 2. The products of the PCR reactions are separated by gel electrophoresis producing a different size fragment for each STS 1 STS 2 Clone A Clone B STS 2 STS 3 STS 4 Clone C Clone D STS 1 STS 2 Contig 3. The presence or absence of each STS in the clones identifies regions of overlap. The final result is a contiguous sequence (contig) of overlapping clones. 10

Genetic Maps Genetic maps are measured in centimorgans -1 c. M = 1% recombination

Genetic Maps Genetic maps are measured in centimorgans -1 c. M = 1% recombination frequency Linkage mapping can be done without knowing the DNA sequence of a gene -Limitations: 1. Genetic distance does not directly correspond to actual physical distance 2. Not all genes have obvious phenotypes 11

Genetic Maps Most common markers are short repeat sequences called, short tandem repeats, or

Genetic Maps Most common markers are short repeat sequences called, short tandem repeats, or STR loci -Differ in repeat length between individuals -13 form the basis of modern DNA fingerprinting developed by the FBI -Cataloged in the CODIS database to identify criminal offenders 12

Genetic Maps Genetic and physical maps can be correlated -Any cloned gene can be

Genetic Maps Genetic and physical maps can be correlated -Any cloned gene can be placed within the genome and can also be mapped genetically 13

Genetic Maps All of these different kinds of maps are stored in databases -The

Genetic Maps All of these different kinds of maps are stored in databases -The National Center for Biotechnology Information (NCBI) serves as the US repository for these data and more -Similar databases exist in Europe and Japan 14

Whole Genome Sequencing The ultimate physical map is the base-pair sequence of the entire

Whole Genome Sequencing The ultimate physical map is the base-pair sequence of the entire genome -Requires use of high-throughout automated sequencing and computer analysis 15

Whole Genome Sequencing Sequencers provide accurate sequences for DNA segments up to 800 bp

Whole Genome Sequencing Sequencers provide accurate sequences for DNA segments up to 800 bp long -To reduce errors, 5 -10 copies of a genome are sequenced and compared Vectors use to clone large pieces of DNA: -Yeast artificial chromosomes (YACs) -Bacterial artificial chromosomes (BACs) -Human artificial chromosomes (HACs) 16 -Are circular, at present

Whole Genome Sequencing Clone-by-clone sequencing -Overlapping regions between BAC clones are identified by restriction

Whole Genome Sequencing Clone-by-clone sequencing -Overlapping regions between BAC clones are identified by restriction mapping or STS analysis Shotgun sequencing -DNA is randomly cut into smaller fragments, cloned and then sequenced -Computers put together the overlaps -Sequence is not tied to other information 17

Copyright © The Mc. Graw-Hill Companies, Inc. Permission required for reproduction or display. Clone-by-Clone

Copyright © The Mc. Graw-Hill Companies, Inc. Permission required for reproduction or display. Clone-by-Clone Method 1. Large DNA clones are first isolated. These arranged into contiguous sequences based on overlapping tagged sites. 2. Large clones are fragmented into smaller clones for sequencing. 3. The entire sequence is assembled from the overlapping larger clones. a. Shotgun Method 1. Cut DNA of entire chromosome into small fragments and clone. 2. Sequence each segment and arrange based on overlapping nucleotide sequences. b. 18

The Human Genome Project Originated in 1990 by the International Human Genome Sequencing Consortium

The Human Genome Project Originated in 1990 by the International Human Genome Sequencing Consortium Craig Venter formed a private company, and entered the “race” in May, 1998 In 2001, both groups published a draft sequence -Contained numerous gaps 19

The Human Genome Project In 2004, the “finished” sequence was published as the reference

The Human Genome Project In 2004, the “finished” sequence was published as the reference sequence (REF-SEQ) in databases -3. 2 gigabasepairs -1 Gb = 1 billion basepairs -Contains a 400 -fold reduction in gaps -99% of euchromatic sequence -Error rate = 1 per 100, 000 bases 20

Characterizing Genomes The Human Genome Project found fewer genes than expected -Initial estimate was

Characterizing Genomes The Human Genome Project found fewer genes than expected -Initial estimate was 100, 000 genes -Number now appears to be about 25, 000! In general, eukaryotic genomes are larger and have more genes than those of prokaryotes -However, the complexity of an organism is not necessarily related to its gene number 21

Characterizing Genomes 22

Characterizing Genomes 22

Finding Genes are identified by open reading frames -An ORF begins with a start

Finding Genes are identified by open reading frames -An ORF begins with a start codon and contains no stop codon for a distance long enough to encode a protein Sequence annotation -The addition of information, such as ORFs, to the basic sequence information 23

Finding Genes BLAST -A search algorithm used to search NCBI databases for homologous sequences

Finding Genes BLAST -A search algorithm used to search NCBI databases for homologous sequences -Permits researchers to infer functions for isolated molecular clones Bioinformatics -Use of computer programs to search for genes, and to assemble and compare genomes 24

Genome Organization Genomes consist of two main regions -Coding DNA -Contains genes than encode

Genome Organization Genomes consist of two main regions -Coding DNA -Contains genes than encode proteins -Noncoding DNA -Regions that do not encode proteins 25

Coding DNA in Eukaryotes Four different classes are found: -Single-copy genes : Includes most

Coding DNA in Eukaryotes Four different classes are found: -Single-copy genes : Includes most genes -Segmental duplications : Blocks of genes copied from one chromosome to another -Multigene families : Groups of related but distinctly different genes -Tandem clusters : Identical copies of genes occurring together in clusters -Also include r. RNA genes 26

Noncoding DNA in Eukaryotes Each cell in our bodies has about 6 feet of

Noncoding DNA in Eukaryotes Each cell in our bodies has about 6 feet of DNA stuffed into it -However, less than one inch is devoted to genes! Six major types of noncoding human DNA have been described 27

Noncoding DNA in Eukaryotes Noncoding DNA within genes -Protein-encoding exons are embedded within much

Noncoding DNA in Eukaryotes Noncoding DNA within genes -Protein-encoding exons are embedded within much larger noncoding introns Structural DNA -Called constitutive heterochromatin -Localized to centromeres and telomeres Simple sequence repeats (SSRs) -One- to six-nucleotide sequences repeated 28 thousands of times

Noncoding DNA in Eukaryotes Segmental duplications -Consist of 10, 000 to 300, 000 bp

Noncoding DNA in Eukaryotes Segmental duplications -Consist of 10, 000 to 300, 000 bp that have duplicated and moved Pseudogenes -Inactive genes 29

Noncoding DNA in Eukaryotes Transposable elements (transposons) -Mobile genetic elements -Four types: -Long interspersed

Noncoding DNA in Eukaryotes Transposable elements (transposons) -Mobile genetic elements -Four types: -Long interspersed elements (LINEs) -Short interspersed elements (SINEs) -Long terminal repeats (LTRs) -Dead transposons 30

Noncoding DNA in Eukaryotes 31

Noncoding DNA in Eukaryotes 31

Expressed Sequence Tags ESTs can identify genes that are expressed -They are generated by

Expressed Sequence Tags ESTs can identify genes that are expressed -They are generated by sequencing the ends of randomly selected c. DNAs ESTs have identified 87, 000 c. DNAs in different human tissues -But how can 25, 000 human genes encode three to four times as many proteins? -Alternative splicing yields different proteins with different functions 32

Alternative Splicing 1 2 3 4 5 6 7 8 9 10 11 12

Alternative Splicing 1 2 3 4 5 6 7 8 9 10 11 12 13 5´ cap 3´ poly-A tail Primary RNA transcript exons introns 3 4 5 6 m. RNA splicing Processed RNA in brain 8 Processed RNA in muscle 9 10 12 5´ cap Mature m. RNA in brain 1 3´ poly-A tail 2 4 5 6 8 9 10 13 5´ cap 3´poly-A tail Mature m. RNA in muscle 33

Variation in the Human Genome Single-nucleotide polymorphisms (SNPs) are sites where individuals differ by

Variation in the Human Genome Single-nucleotide polymorphisms (SNPs) are sites where individuals differ by only one nucleotide -Must be found in at least 1% of population Haplotypes are regions of the chromosome that are not exchanged by recombination -Tendency for genes not to be randomized is called linkage disequilibrium -Can be used to map genes 34

Copyright © The Mc. Graw-Hill Companies, Inc. Permission required for reproduction or display. SNPs

Copyright © The Mc. Graw-Hill Companies, Inc. Permission required for reproduction or display. SNPs SNP SNP Chromosome 1 A A C G C C A T T CGGGGT C AG T C G AC C G Chromosome 2 A A C G C C A T T CG AGGT C AG T C AAC C G Chromosome 3 A A C A T G C C A T T CGGGGT C AG T C AAC C G Chromosome 4 A A C G C C A T T CGGGGT C AG T C G AC C G a. Haplotypes Haplotype 1 C T C A A A G T A C G G T T C A G G C A Haplotype 2 T T G A T T G C A A C A G T A A T A Haplotype 3 C C C G A T C T G A T A C T G G T G Haplotype 4 T C G A T T C C G G T T C A G A C A b. Diagnostic SNPs A/G T/C C/G Haplotype 1 A T C Haplotype 2 A C G Haplotype 3 G T C Haplotype 4 A C G 35

Genomics Comparative genomics, the study of whole genome maps of organisms, has revealed similarities

Genomics Comparative genomics, the study of whole genome maps of organisms, has revealed similarities among them -For example, over half of Drosophila genes have human counterparts Synteny refers to the conserved arrangements of DNA segments in related genomes -Allows comparisons of unsequenced genomes 36

Genomics 37

Genomics 37

Copyright © The Mc. Graw-Hill Companies, Inc. Permission required for reproduction or display. Rice

Copyright © The Mc. Graw-Hill Companies, Inc. Permission required for reproduction or display. Rice Sugarcane Corn Wheat Genomic Alignment (Segment Rearrangement) 38

Genomics Organellar genomes -Mitochondria and chloroplasts are descendants of ancient endosymbiotic bacterial cells -Over

Genomics Organellar genomes -Mitochondria and chloroplasts are descendants of ancient endosymbiotic bacterial cells -Over time, their genomes exchanged genes with the nuclear genome -Both organelles contain polypeptides encoded by the nucleus 39

Genomics Functional genomics is the study of the function of genes and their products

Genomics Functional genomics is the study of the function of genes and their products DNA microarrays (“gene chips”) enable the analysis of gene expression at the whole-genome level -DNA fragments are deposited on a slide -Probed with labeled m. RNA from different sources -Active/inactive genes are identified 40

Copyright © The Mc. Graw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Copyright © The Mc. Graw-Hill Companies, Inc. Permission required for reproduction or display. 1. Unique, PCR-amplified Arabidopsis genome 2. DNA is printed onto a microscope slide. fragments (1, 2, 3, 4. . . ) are contained in each well of a plate. Plate containing genome fragments 2 Robotic quill DNA microarray Microscope slide 1 4 3 DNA 1 3. Samples of m. RNA are obtained from two different tissues. Probes for each sample are prepared using a different fluorescent nucleotide for each sample. Flower-specific m. RNA (sample 1) 2 3 4 4. The two probes are mixed and hybridized with the microarray. Fluorescent signals on the microarray are analyzed. Probe 1 Mix Hybridize Reverse transcriptase Fluorescent nucleotide Probe 2 c. DNA probe Leaf-specific m. RNA (sample 2) Reverse transcriptase Different fluorescent nucleotide c. DNA probe Weak signal from probe 2 Similar signals from both probes Strong signal from probe 1 Weak signal from probe 1 41

Genomics Transgenics is the creation of organisms containing genes from other species (transgenic organisms

Genomics Transgenics is the creation of organisms containing genes from other species (transgenic organisms -Can be used to determine whether: -A gene identified by an annotation program is really functional in vivo -Homologous genes from different species have the same function 42

Genomics 43

Genomics 43

Proteomics is the study of the proteome -All the proteins encoded by the genome

Proteomics is the study of the proteome -All the proteins encoded by the genome The transcriptome consists of all the RNA that is present in a cell or tissue 44

Proteomics Proteins are much more difficult to study than DNA because of: -Post-translational modifications

Proteomics Proteins are much more difficult to study than DNA because of: -Post-translational modifications -Alternative splicing However, databases containing the known protein structural motifs exist -These can be searched to predict the structure and function of gene sequences 45

Proteomics 46

Proteomics 46

Proteomics Protein microarrays are being used to study large numbers of proteins simultaneously -Can

Proteomics Protein microarrays are being used to study large numbers of proteins simultaneously -Can be probed using: -Antibodies to specific proteins -Small molecules The yeast two-hybrid system has generated large-scale maps of interacting proteins 47

Applications of Genomics The genomics revolution will have a lasting effect on how we

Applications of Genomics The genomics revolution will have a lasting effect on how we think about living systems The immediate impact of genomics is being seen in diagnostics -Identifying genetic abnormalities -Identifying victims by their remains -Distinguishing between naturally occurring and intentional outbreaks of infections 48

Applications of Genomics 49

Applications of Genomics 49

Applications of Genomics has also helped in agriculture -Improvement in the yield and nutritional

Applications of Genomics has also helped in agriculture -Improvement in the yield and nutritional quality of rice -Doubling of world grain production in last 50 years, with only a 1% cropland increase 50

Applications of Genomics Genome science is also a source of ethical challenges and dilemmas

Applications of Genomics Genome science is also a source of ethical challenges and dilemmas -Gene patents -Should the sequence/use of genes be freely available or can it be patented? -Privacy concerns -Could one be discriminated against because their SNP profile indicates susceptibility to a disease? 51