Genomics An introduction Aims of genomics I n

  • Slides: 22
Download presentation
Genomics An introduction

Genomics An introduction

Aims of genomics I n Establishing integrated databases – being far from merely a

Aims of genomics I n Establishing integrated databases – being far from merely a storage n Linking genomic and expressed gene sequences c. DNA

Aims of genomics II n Describing every gene: • function/expression data/relationships/phenotype • 3 -d

Aims of genomics II n Describing every gene: • function/expression data/relationships/phenotype • 3 -d structure and features (introns/exons, domains, repeats) • similarities to other genes n Characterize population sequence diversity in

Genomics can be: n Structural – where it is? n Functional – what it

Genomics can be: n Structural – where it is? n Functional – what it does? – DNA microarrays: n Comparative – finding important fragments

Mapping genomes n Past – Genetic maps Distance between simple markers expressed in units

Mapping genomes n Past – Genetic maps Distance between simple markers expressed in units of recombination – Cytological maps Stained chromosomes, observable under microscope n Present – Physical maps Distance between nucleotides expressed in bases – Comparative map Corresponding genes detection; Regulatory sequence detection;

Genome sizes Organism DNA length Genes Mycoplasma genitalium 0. 5 Mb 470 Deinococcus radiodurans

Genome sizes Organism DNA length Genes Mycoplasma genitalium 0. 5 Mb 470 Deinococcus radiodurans 3 Mb in 410 copies! 3 200 Escherichia coli 4. 5 Mb 4 400 Saccharomyces 12 Mb 6 200 Caenorhabditis elegans 97 Mb 22 000 Drosophila melanogaster 120 Mb 18 000 Homo sapiens 3200 Mb 32 000 cerevisiae

Genetic differences among humans n Goals – Genetic diseases – Identifying criminals n Methods

Genetic differences among humans n Goals – Genetic diseases – Identifying criminals n Methods – Genetic markers (fingerprints) and DNA sequence. Repeats: • Microsatellites (repeats of 1 -12 nucleotides) • Minisatellites (> 12) – Other types of variation • Genome rearrangements • Single nucleotide mutations

Microsatellites and disease n Huntington’s disease – Huntingtin gene of unknown (!) function –

Microsatellites and disease n Huntington’s disease – Huntingtin gene of unknown (!) function – Repeats #: 6 -35: normal; 36 -120: disease • Friedrich ataxia disease – GAA repeat in non-coding (intron) region – Repeats #: 7 -34: normal; 35 up: disease – Repeat expansion reduces expression of frataxin gene

SNP - Single Nucleotide Polymorphism n Definition – SNP and phenotype n Occurrence in

SNP - Single Nucleotide Polymorphism n Definition – SNP and phenotype n Occurrence in genome – Rarity of most SNPs (agrees with neutral molecular evolutionary theory) – SNPs in human population: Inter-genic regions Coding regions Every 1400 bp Every 1430 bp • High variance in genome! n Detection of SNPs: Hybridization

Sickle cell anemia SNP on Beta Globin gene, which is recessive: • 2 faulty

Sickle cell anemia SNP on Beta Globin gene, which is recessive: • 2 faulty copies: red blood cells change shape under stress anemia • 1 faulty copy: red blood cells change shape under heavy stress – but gives resistance to malaria parasite Sickle looks like this:

SNPs and haplotypes Passengers and their evolutionary vehicles

SNPs and haplotypes Passengers and their evolutionary vehicles

SNP - Phase inference n In the data from sequencing the genome the origin

SNP - Phase inference n In the data from sequencing the genome the origin of SNP is scrambled G G. . . CT AC GT. . . T A Possibility 1 Possibility 2 chromosome . . . CTGACGGT. . . CTGACAGT. . . chromosome . . . CTTACAGT. . . CTTACGGT. . . n Which SNPs are on the same chromosome (are in phase)?

SNP – phase inference Determining the parent of origin for each SNP G C.

SNP – phase inference Determining the parent of origin for each SNP G C. . . CT AC GT. . . A G . . . C A CT AC GT. . . T A G G CT AC GT. . . T A In this case: GG TA Phase inference – the reason why many SNPs sequencing is done for child and two parents.

Linkage Disequilibrium, intro How hard is it to break a chromosome n An allele/trait/SNP

Linkage Disequilibrium, intro How hard is it to break a chromosome n An allele/trait/SNP A and a are on the same position in genome (locus), thus on a single chromosome an individual can have either of them – but not both – f. A - frequency of occurrences of trait A in population – fa = 1 - f. A – f. B, fb = 1 - f. B are frequency occurrences of B and b n Probabilities of occurences of both traits on the same chromosome: A B f. AB A b f. Ab a B fab a b n LD and genomic recombination

Linkage Disequilibrium, calculation When these alleles are not correlated we expect them to occur

Linkage Disequilibrium, calculation When these alleles are not correlated we expect them to occur together by chance alone: f. AB = f. A f. B f. Ab = f. A fb fa. B = fa f. B fab = fa fb n But if A and B are occurring together more often (disequilibrium state), we can write f. AB = f. A f. B + D f. Ab = f. A fb - D fa. B = fa f. B - D fab = fa fb + D n where D is called the measure of disequlibrium n Of course from definitions above we have D = f. AB - f. A f. B n

How can we use it? n Phase inference tells us how SNPs are organized

How can we use it? n Phase inference tells us how SNPs are organized on chromosome n Linkage disequilibrium measures the correlation between SNPs

Back to SNPs Daly et al (2001), Figure 1

Back to SNPs Daly et al (2001), Figure 1

Haplotypes - vehicles for SNPs Daly et al (2001) were able to infer offspring

Haplotypes - vehicles for SNPs Daly et al (2001) were able to infer offspring haplotypes largely from parents. They say that “it became evident that the region could be largely decomposed into discrete haplotype blocks, each with a striking lack of diversity“ n The haplotype blocks: n – Up to 100 kb – 5 or more SNPs For example, this block shows just two distinct haplotypes accounting for 95% of the observed chromosomes

Haplotypes on the genome fragment a) b) c) Observed haplotypes with dotted lines wherever

Haplotypes on the genome fragment a) b) c) Observed haplotypes with dotted lines wherever probability of switching to another line is > 2% Percent of explanation by haplotypes Contribution of specific haplotypes

Another genetic test Does haplotypes exist? - Each row represents an SNP - Blue

Another genetic test Does haplotypes exist? - Each row represents an SNP - Blue dot = major yellow = minor - Each column represents a single chromosome - The 147 SNPs are divided into 18 blocks defined by black lines. - The expanded box on the right is an SNP block of 26 SNPs over 19 kb of genomic DNA. The 4 most common of 7 different haplotypes include 80% of the chromosomes, and can be distinguished with 2 SNPs

How much SNPs we can ignore? …and still predict haplotypes with high accuracy?

How much SNPs we can ignore? …and still predict haplotypes with high accuracy?

Literature Gibson, Muse „A Primer of Genome Science” n N Patil et al. Blocks

Literature Gibson, Muse „A Primer of Genome Science” n N Patil et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21 Science 294 2001: 1719 -1723. n M J Daly et al. High-resolution haplotype structure in the human genome Nat. Genet. 29 2001: 229 -232. n