Markov Chain Algorithm in Perl BY IAN WYNYARD
Markov Chain Algorithm in Perl BY IAN WYNYARD
What’s so great about Perl? High-Level, General-Purpose, Dynamic Language Developed in 1987 originally as a general-purpose Unix scripting language Since then it has evolved into an exceptionally effective language for high order computations because of the effectiveness of vector space structuring embedded in the language Very powerful tools are built into the language for text processing facilities without arbitrary data-length handling restrictions Borrows elements from other languages like C, Shell Scripting (SH), and AWK (those similarities are discussed in 3. 7)
Some interesting stuff about Perl 5 is the primary language of CGI Used in: Graphics Programming System Administration Network Programming Finance “The duct-tape of the internet” Functional Ugly Reputation as one of the ugliest languages out there Has 2 official slogans “There is more than one way to do it” "Easy things should be easy and hard things should be possible“
What’s different about Perl? Derives most of the form factor from C Designed to make programming easier, not computers more efficient Uses specific character prefixes to denote type $ - Scalar @ - Indexed Array [] – Used to Index Arrays {} – Used to Index Hashes (associative arrays) are borrowed from AWK Can package code as reusable modules A procedural language
The Markov Chain Algorithm Named after Russian mathematician Andrey Markov (1856 – 1922) Mathematical system that transitions between defined states What we are doing conceptually with any Markov-derived algorithm is treating the indexing factor like the time domain and the indexed contents as a state It’s a memoryless mathematical system (functional depth of 0) Usually a conditional associative path determination Very useful for many real world models, specifically non-behavioral ones
Markov Chain Algorithm in Perl BMAXCEN = 10000; $NONWORD = "n”; $w 1 = $w 2 = $NONWORD; while (<>) { foreach (split) { push(@{$statetab {$wl} {$w 2}}, $_) ; ($w 1, $w 2) = ($w 2, $_); } } push(@ {$statetab {$w 1} {$w 2}}, $NONWORD) ; $w 1 = $w 2 = $NONWORD; for ($i = 0; $i < $MAXGEN; $i++) 1 $suf = $statetab {$w 1} {$w 2); $r = int(rand @$suf) ; exit if (($t = $suf -> [$r]) eq $NONWORD); print "$tn”; ($wl, $w 2) = ($w 2, $t); } # initial state # read each line of input # multiple assignment # add tail # array reference # @$suf is number of elems # advance chain
Algorithm Construction Phases of Construction Build Generate Logical construction of the algorithm Application of the logic Declaration of Initial Values $MAXGEN = 10000; #max words to generate $NONWORD = “n”; #non-word variable $w 1 = $w 2 = $NONWORD; #initializing prefix variables
Algorithm Construction – Build Phase … while (<>) { # read each line of input foreach (split) { push(@{$statetab{$w 1}{$w 2}}, $_); ($w 1, $w 2) = ($w 2, $_); # multiple assignment } } push(@{$statetab{$w 1}{$w 2}}, $NONWORD); … # add tail
Algorithm Construction – Build Phase Building Word List “I see a man standing over here. I see a woman standing over there” $statetab = ( ‘I see‘ ► [a, a], ‘see a’ ► [man, woman], ‘a man’ ► [standing], ‘man standing’ ► [over], ‘standing over’ ► [here], …);
Algorithm Construction – Build Phase Limitations 2 prefix storage means that identical passages are differentiated to a degree of 2 The number of total generated storage variables in the hash table is equal to the degree of recognition THE TERM FOR THIS IS N-GRAMS This system establishes greater specialization of information, but less aptitude for general application This issue is eliminated in mathematical systems with memory (Bayesian systems)
Algorithm Construction – Limitations $statetab = (‘I see‘ ► ‘see a’ ‘a man’ [a, a], ► ► [man, woman], [standing], ‘man standing’ ► [over], ‘standing over’ ► [here, there], …); Original: “I see a man standing over here. I see a woman standing over there” Sample Output: “I see a woman standing over here”
Algorithm Construction – Generate Phase … $w 1 = $w 2 = $NONWORD; for ($i = 0; $i < $MAXGEN; $i++) { $suf = $statetab{$w 1}{$w 2}; # array reference $r = int(rand @$suf); # @$suf is number of elems exit if (($t = $suf->[$r]) eq $NONWORD); print "$tn"; ($w 1, $w 2) = ($w 2, $t); } # advance chain # end of algorithm
Bibliography The Practice of Programming – Section 3. 7 Wikipedia: Perl (5 and 6) Andrey Markov Perlmonks. org Markov Chain Algorithm
- Slides: 13