Source

pycon2013 / mto-blast.txt

Full commit
1
2
3
4
5
6
7
- Create sentences like in MediaTakeOut
- lang mode with trigram: prob(w3) given w1, w2
- about 40K sentences scraped
- Use NLTK NgramModel class
    - smoothing: probabilities for unseen
- Need to filter out too common phrases
- Output not alway grammatical (asked for size, not sentence)