CSCI 69004900 Special Topics in Computer Science Automata

  • Slides: 16
Download presentation
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics problems

CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics problems • sequence comparison • pattern/structure search • pattern/structure recognition • relationship of sequences Algorithm design • optimal algorithms • heuristic algorithms • parallel algorithms Probabilistic models • stochastic finite state automata (HMMs) • stochastic regular grammars • stochastic context-free grammars • more complex grammar models

Probabilistic modeling and algorithms M: modeling a family of sequences (e. g. RNA) to

Probabilistic modeling and algorithms M: modeling a family of sequences (e. g. RNA) to capture certain properties Q 1, Q 2, …. (1) Each sequence x possesses a property Qk(x) with probability Pk(x) (2) A probability distribution for each sequence x over the properties, i. e. , ∑k Pk(x) = 1 for each given x (3) The most likely property Q*(x) is one with the highest probability, i. e. , Q*(x) = arg maxk { Pk(x) } (4) Algorithms are designed to find the most likely property for given sequences. But how? D (sample, training data) M assigning probs Modeling mechanism Computational linguistic systems can describe desired properties of bio sequences

Outline for the course • Part 0: molecular biology basics and review of probability

Outline for the course • Part 0: molecular biology basics and review of probability theory • Part 1: pairwise alignment, HMMs, profile-HMMs, gene finding, and multiple alignment (chapters 1 -6) potential research projects: efficient HMM algorithms, gene finding • Part 2: RNA stem-loops, SCFG, secondary structure prediction, structural homology search (chapters 9 -10) potential research projects: efficient SCFG algorithms, pseudoknot prediction, protein secondary structure prediction • Part 3: phylogeny reconstruction, probabilistic approaches (chapters 7 -8) potential research projects: grammar modeling of evolution

The ways this course is to be conducted • To learn new concepts and

The ways this course is to be conducted • To learn new concepts and techniques Lectures (by the instructor and students) • To apply learned knowledge to research Research discussions (lead by students and the instructor) • To demonstrate learning effectiveness Presentations of research results (by students)

The central dogma of molecular biology

The central dogma of molecular biology

Building blocks of DNA Nucleotides • Purines Adenine, Guanine • Pyrimidines Cytosine, Thymine

Building blocks of DNA Nucleotides • Purines Adenine, Guanine • Pyrimidines Cytosine, Thymine

Double helix of DNA

Double helix of DNA

DNA replication

DNA replication

Genetic code

Genetic code

Mutations (1) synonymous (2) Missense (3) nonsense (4) frame-shift

Mutations (1) synonymous (2) Missense (3) nonsense (4) frame-shift

RNA synthesis

RNA synthesis

RNA synthesis (cont’)

RNA synthesis (cont’)

RNA can fold to itself

RNA can fold to itself

Protein synthesis

Protein synthesis

Biological information flow Genome AGACGCTGGTATCGCAT TAACGGGTTACTC GGATATTACCTTACTAT AGGGCGCTATCGCGCGT TAATCTGGTATC Introns Exons Gene sequence Regulatory

Biological information flow Genome AGACGCTGGTATCGCAT TAACGGGTTACTC GGATATTACCTTACTAT AGGGCGCTATCGCGCGT TAATCTGGTATC Introns Exons Gene sequence Regulatory DNA sequence Protein-DNA interactions Gene expression Protein abundance Gene regulation Protein sequence Protein structure Sequence family Structure family Protein-protein interactions Protein function Cellular role

What bioinformatics is NOT: • Not just using a computer to speed up biology

What bioinformatics is NOT: • Not just using a computer to speed up biology • Not just applying computer algorithms to biology • Not just the accountant of genomic data What bioinformatics is then: • The creative use of computers to define and solve central biological puzzles • The computer becomes an hypothesis machine, making predictions to be tested at the bench.