HTTPS SSH

16-12-2016: There is a new, better performing much faster version available by now!, it will be released soon!

see: https://bitbucket.org/robvanderg/monoise

First run prep.sh, like:

./prep.sh

The only files which are not directly downloadable is the google ngram corpus. This dataset is neccesary to replicate the results.

Now you can run train and test with the command:

python3 main.py

This script depends on scipy, numpy, sklearn, matplotlib and gensim.

If you use this normalization model, please cite:

@InProceedings{vandergoot:2016:normsome,
    author  = {van der Goot, Rob},
    title   = {Normalizing Social Media Texts by Combining Word Embeddings and Edit
Distances in a Random Forest Regressor},
    publisher = {Normalisation and Analysis of Social Media Texts (NormSoMe)},
    year = {2016}
}