# Wiki

# markov-mania / Home

## Welcome

markov-mania is a project to sample various programming languages by implementing a naive Markov-chain text generator in each of them.

### Why?

The idea is to take the "hello world" demo a step further. For a given programming language/environment, Hello World covers:

- the tool chain
- string literals
- output

On top of this, Markov adds:

- control structures (loops are pretty necessary)
- filesystem access
- some non-trivial data structure (usually maps and lists)
- pseudo-random numbers

So gives a better feel for actually *doing* something. If you don't find it useful, no worries. It is mostly for my own edification while playing with new languages.

### Markov Chains

A Markov chain is a sequence in which the probabilities of possible future states are determined by the present state. For a Markov text generator, we treat the most recent N words W(0)..W(N-1) as the present state, and the most recent words after adding one more as the future state; that is, W(1)..W(N). The probability function for W(N) to be word X is the number of times that X followed W(0)..W(N-1) in some input corpus, divided by the number of times the sequence W(0)..W(N-1) appeared in the corpus at all.

In order to make short sentences more likely, we also assume that the corpus is divided into lines, each of which is a sentence. For the purposes of frequency calculation, no words follow the end of a sentence.

If that didn't make any sense, the Wikipedia article on Markov chains may help: http://en.wikipedia.org/wiki/Markov_chain

For example, let's assume this is the input corpus:

A cat flew. A cat chewed.

With a chain length of 2, the probability distribution looks like this:

W(0)..W(N-1) | W(N) | Probability |
---|---|---|

_, _ | A | 2/2 = 1 |

_, A | cat | 2/2 = 1 |

A, cat | flew | 1/2 = 0.5 |

A, cat | chewed | 1/2 = 0.5 |

cat, flew | <end> | 1/1 = 1 |

cat, chewed | <end> | 1/1 = 1 |

For larger input corpuses, the text can be more interesting. Here is a sample from a text generator run with the text of Alice in Wonderland as the input corpus:

won't she be savage if I've kept her waiting!' Alice felt so desperate that she was quite pleased to find that her neck would bend about easily in any direction like a serpent.

Updated