Evolution Aristotle classification of animals theories on change

  • Slides: 33
Download presentation
Evolution Aristotle: classification of animals theories on change (change is the actuality of the

Evolution Aristotle: classification of animals theories on change (change is the actuality of the potential) Darwin: descent with modification natural selection There is no evolution without change

Evolving nomenclature change in DNA code = genetic variation change with respect to what?

Evolving nomenclature change in DNA code = genetic variation change with respect to what? any consequence? • • • Mutations Single Nucleotide Polymorphism SNPs Deletion/insertion polymorphism DIPs Short Nucleotide Polymorphism SNPs Short Nucleotide Variants SNVs Short Genetic Variants

Definitions pol·y·mor·phism (pl-môrfzm) n. 1. Biology The occurrence of different forms, stages, or types

Definitions pol·y·mor·phism (pl-môrfzm) n. 1. Biology The occurrence of different forms, stages, or types in individual organisms or in organisms of the same species, independent of sexual variations. 2. Chemistry Crystallization of a compound in at least two distinct forms. Also called pleomorphism. var·i·ant (vâr-nt, vr-) adj. 1. Having or exhibiting variation; differing. 2. Tending or liable to vary; variable. 3. Deviating from a standard, usually by only a slight difference. n. Something that differs in form only slightly from something else, as a different spelling or pronunciation of the same word.

Human Genome Project ENCODE project Hap. Map project SNP consortium Individual human genomes James

Human Genome Project ENCODE project Hap. Map project SNP consortium Individual human genomes James Watson, Craig Venter, 3 asian gentlemen

Evolving SNV analysis needs • Single SNP • Millions of SNPs How to structure

Evolving SNV analysis needs • Single SNP • Millions of SNPs How to structure the analysis is based on the same theories… It’s a question of scale and heuristics • Finding SNPs in single gene sequence • Finding SNPs in GWAS studies, other exome sequencing etc…

Calling SNPs in NGS • Polymorphisms with respect to a reference genome • Challenging

Calling SNPs in NGS • Polymorphisms with respect to a reference genome • Challenging because of alignment errors, variable depth of coverage • Accuracy is essential – diagnostics, risk assessment • False positives and false negatives both a problem – Given 1% sequencing error, how many high quality reads do we need to call a variant – Quality scores differ per experiment – The tools we use should have prior knowledge of known SNPs and their relevance to our question, ie causing disease or not

Prioritization of SNPs • You have millions • How do you know which are

Prioritization of SNPs • You have millions • How do you know which are important for your research? First let’s look at what SNPs can do…

So you have a SNP • Is it associated with disease? If so, why?

So you have a SNP • Is it associated with disease? If so, why? – Is it to do with protein function – or transcriptional regulation – or both, or none, or what? • If none of the above, – then why is it associated with disease? – how do you begin toimagine its function?

SNP function prediction (summary) • (in coding sequence) Protein Function – Ligand binding affinity

SNP function prediction (summary) • (in coding sequence) Protein Function – Ligand binding affinity – Co-factor binding affinity – targeting to different cellular compartment • (in coding or non-coding sequence) Gene Processing – Transcriptional regulation – Translational regulation – Splicing

Assessment of SNP Function • Position of SNP – db. SNP or new SNP:

Assessment of SNP Function • Position of SNP – db. SNP or new SNP: first identify location • In a coding sequnce: non-synonymous – Protein Data Bank , Poly. Phen – Uni. Prot, Psi. Pred (secondary structure prediction tool) – Pro. Site, Inter. Pro Done individually, or incorporated into software to scale up for high throughput

Example: AGT & Hypoxaluria

Example: AGT & Hypoxaluria

SNP mutation causes disease CCA > CTA => Proline > Leucine (P 11 L)

SNP mutation causes disease CCA > CTA => Proline > Leucine (P 11 L) C C C P: Pro L: Leu C N C

Two more in AGT Gly 82 Glu O blocks binding to cofactor O C

Two more in AGT Gly 82 Glu O blocks binding to cofactor O C H C C H G: Gly 41 Arg E: Glu disrupts intermonomer interactions H C C H G: Gly N N R: Arg C C N

Assessment of SNP Function - I • Position of SNP • In CDS: non-synonymous

Assessment of SNP Function - I • Position of SNP • In CDS: non-synonymous – Protein Data Bank , Poly. Phen – Uni. Prot, Psi. Pred – Pro. Site, Inter. Pro • Upstream of CDS or in CDS and synonymous – Signal. P, Pro. Site, rate of processing? – TRANSFAC – DBTSS Is it in a regulatory – NXSensor element?

Translation initiation site Initiation codon ATG promoter 5’UTR Exon 1 5’ Exon 2 TSS

Translation initiation site Initiation codon ATG promoter 5’UTR Exon 1 5’ Exon 2 TSS Transcriptional Start Site promoter Exon 1 Transcription factor binding sites TFBSs Exon 2 3’

SNP in a regulatory element TFBS ACAGTCGTAAGGCTGATTGGCTGGATAGCAGTACG Single nucleotide polymorphism ACAGTCGTAAGGCTAATTGGCTGGATAGCAGTACG May disrupt TF

SNP in a regulatory element TFBS ACAGTCGTAAGGCTGATTGGCTGGATAGCAGTACG Single nucleotide polymorphism ACAGTCGTAAGGCTAATTGGCTGGATAGCAGTACG May disrupt TF binding and therefore functionality

TSS SN P AT G Example: CYP 2 E 1 Track from DBTSS

TSS SN P AT G Example: CYP 2 E 1 Track from DBTSS

Nucleosomes

Nucleosomes

Assessment of SNP Function - II • In non-coding sequence – First, assess conservation

Assessment of SNP Function - II • In non-coding sequence – First, assess conservation – TRANSFAC – mi. RNA registry – Repeatmasker – Alternative splicing – Hap. Map Is it in a regulatory element?

Prioritization of SNPs • You have millions • How do you know which are

Prioritization of SNPs • You have millions • How do you know which are important for your research? How do (can you? ) you implement this into a pipeline so you can do thousands at once? How can you come up with strategies to prioritise?

Statistical genetics • If a SNV is present in all members of the family,

Statistical genetics • If a SNV is present in all members of the family, affected and not, then it is to do with something innocuous. Some methods are based on how common these variants are in families. ie shared ancestral variants and genetic linkage co-segregation Need pedigree haplotype information Mostly used in GWAS studies BEAGLE, GERMLINE, PLINK IBD, MERLIN

Several Tools Out There • For example: – Seattle. Seq – db. NSFP •

Several Tools Out There • For example: – Seattle. Seq – db. NSFP • built into other NGS analysis software • New ideas continue to emerge…

The Plot Thickens…

The Plot Thickens…

If you Google directly to db. SNP 10 Nov 2015

If you Google directly to db. SNP 10 Nov 2015

The NCBI homepage: if you go to db. SNP from here

The NCBI homepage: if you go to db. SNP from here

You get this: but no worries, both access the same underlying database.

You get this: but no worries, both access the same underlying database.

Combining gene expr. & variations e. QTL: expression quantitative trait locus • • Correlation

Combining gene expr. & variations e. QTL: expression quantitative trait locus • • Correlation between gene expr. and freq. of variation Simple linear regression (matrixe. QTL) Significance is assessed by p-value