Emerging Database course Markov Models Markov Models Natural

Emerging Database course: Markov Models

Markov Models Natural languages are good examples to use with a finite-context model, but there are other strings for which this type of model is not usable. The most relevant example for this type of string is DNA, in which four type of bases (A, C, G, or T) can occur in triplets. The probability of one of the bases occurring within a triplet is strongly influenced by its position. Unfortunately, any knowledge about the previous base or even a number of previous bases does not help much to estimate the probabilities, and a position can only be decided by counting back to the beginning of the DNA sequence. Final-state probabilistic models are good tools for handling such type of strings. Because of the probabilistic theory background these models are of ten referred to as Markov models.

For a given alphabet An one can imagine such a model as a finite-state machine with n different states denoted by S; , i = 1, 2, . . . , n. In each state we can give a set of transition probabilities p; ; , which gives the transition probability from the current state S; to the state S; . (It is clear that Lf=l P; ; = 1. ) Using this model for a given string we get a unique path through the model, and the probability of any string can be computed by multiplying the probabilities out of each state. Markov models are very useful for modeling natural languages, and they are widely used for this purpose. Some of the state models are non-ergodic in the sense that in these models there are states from which parts of the system are permanently inaccessible. All state models used in connection with natural languages are ergodic.

REFERENCES • Timon C. Du. , Emerging Database System Architectures • Bochmann, G. Concepts for Distributed Systems Design • Capron, H. L. Computers: Tools for an Information Age