1. David Winslow
  2. markov-mania

Wiki

Clone wiki

markov-mania / Home

Welcome

markov-mania is a project to sample various programming languages by implementing a naive Markov-chain text generator in each of them.

Why?

The idea is to take the "hello world" demo a step further. For a given programming language/environment, Hello World covers:

  • the tool chain
  • string literals
  • output

On top of this, Markov adds:

  • control structures (loops are pretty necessary)
  • filesystem access
  • some non-trivial data structure (usually maps and lists)
  • pseudo-random numbers

So gives a better feel for actually *doing* something. If you don't find it useful, no worries. It is mostly for my own edification while playing with new languages.

Markov Chains

A Markov chain is a sequence in which the probabilities of possible future states are determined by the present state. For a Markov text generator, we treat the most recent N words W(0)..W(N-1) as the present state, and the most recent words after adding one more as the future state; that is, W(1)..W(N). The probability function for W(N) to be word X is the number of times that X followed W(0)..W(N-1) in some input corpus, divided by the number of times the sequence W(0)..W(N-1) appeared in the corpus at all.

In order to make short sentences more likely, we also assume that the corpus is divided into lines, each of which is a sentence. For the purposes of frequency calculation, no words follow the end of a sentence.

If that didn't make any sense, the Wikipedia article on Markov chains may help: http://en.wikipedia.org/wiki/Markov_chain

For example, let's assume this is the input corpus:

A cat flew.
A cat chewed.

With a chain length of 2, the probability distribution looks like this:

W(0)..W(N-1)W(N)Probability
_, _A2/2 = 1
_, Acat2/2 = 1
A, catflew1/2 = 0.5
A, catchewed1/2 = 0.5
cat, flew<end>1/1 = 1
cat, chewed<end>1/1 = 1

For larger input corpuses, the text can be more interesting. Here is a sample from a text generator run with the text of Alice in Wonderland as the input corpus:

won't she be savage if I've kept her waiting!' Alice felt so desperate that she was quite pleased to find that her neck would bend about easily in any direction like a serpent.

Updated