Random Writer or Probabilistic Text Generation A Nifty
Random Writer or Probabilistic Text Generation A Nifty Assignment from Joe Zachary School of Computing University of Utah
Random Writer • Based on an idea by Claude Shannon (1948) popularized by A. K. Dewdney (1989) • Generates random text based on the patterns in a source file • Both fun and appropriate for CS 2 students • Guess which one is actually from the text – which means the other two are random
Tom Sawyer Huck started to act very intelligently on the back of his pocket behind, as usual on Sundays. He was always dressed fitten for drinking some old empty hogsheads. The men contemplated the treasure awhile in blissful silence.
Tom Sawyer n. Gram length == 6 Huck started to act very intelligently on the back of his pocket behind, as usual on Sundays. He was always dressed fitten for drinking some old empty hogsheads. The men contemplated the treasure awhile in blissful silence.
Hamlet Ay me, what act, That roars so loud and thunders in the index? Worse that a rat? Dead for a ducat, drugs fit that I bid you not? Leave heart; for to our lord, it we show him, but skin and he, my lord, I have fat all not over thought, good my lord?
Hamlet n. Gram length == 5 Ay me, what act, That roars so loud and thunders in the index? Worse that a rat? Dead for a ducat, drugs fit that I bid you not? Leave heart; for to our lord, it we show him, but skin and he, my lord, I have fat all not over thought, good my lord?
Alice in Wonderland n. Gram length == 9 This was not here before, ' said the Dormouse again, and we won't talk about cats or dogs 'Let us get to the shore, and then I'll tell you my history, and you'll understand 'It IS a long tail, certainly, '
Alice in Wonderland This was not here before, ' said the Dormouse again, and we won't talk about cats or dogs 'Let us get to the shore, and then I'll tell you my history, and you'll understand 'It IS a long tail, certainly, '
Niftiness • Not a toy: it slurps up entire books • Defies expectations: it turns out to be both straightforward and educational • Entertaining: I (Joe Zachary) run a contest to find the funniest generated text
n. Gram length = 1 • The probability that ch is the next character to be produced equals the probability that ch occurs in the source file. • Just pick any char from the text at random • quite unreadable rla bsht e. S ststofo hhfosdsdewno oe wee h. mr ae irii ela iad o r te u t mnyto onmalysnce, ifu en c f. Dwn oee iteo
n. Gram length ==2 • Let the n. Gram have two character “in” “nd” “he” • The probability that ch is the next character to be produced equals the probability that ch follows those two characters in the source text "Shand tucthiney m? " le ollds mind Theybooure He, he s whit Pereg lenigabo Jodind alllld ashanthe ainofevids tre lin--p asto oun
Bigger n. Gram length Let the n. Gram be the previously produced k (4 in this case) characters. The probability that ch is the next character to be produced equals the probability that ch follows the n. Gram in the source text. Mr. Welshman, but him awoke, the balmy shore. I'll give him that he couple overy because in the slated snufflindeed structure's
Algorithm • Read the entire book into one big String. Buffer – use String. Buffer’s append(String s) • Pick an initial n. Gram randomly from that one big String that holds the entire book • For each “random” char to print: – Make a List<Character> holding every char in the book that follows the current n. Gram – Randomly pick a character ch from the List<Character> that follows the n. Gram – Print ch – Remove the 1 st char from the n. Gram, append ch
An Example: Print 1 char n. Gram length == 2 • Given this current state of the system: – The one big string: We hold these truths to be selfevident: that all men are created equal; that they – A random n. Gram: “th” • For this one example loop iteration, do the following: – build a new List of following chars [e, s, a, a, e] – pick a char to print (random, could only be e s or a): s – change n. Gram (remove first char, add printed char): “hs”
- Slides: 14