1. Adrian Scoica
  2. Strip Markup From Wikipedia Dumps


Author: Adrian Scoica (adrian.scoica@gmail.com)
Please direct any comments or questions at the e-mail address above.

This script will parse textile wikipedia dumps and convert them to plain text.

To use it, please go through the following steps:

(0) Make sure you have ruby and rubygems installed.

(1) Download and install vidibus

git clone https://github.com/vidibus/vidibus-textile.git .

sudo bundle install
gem build vidibus-textile.gemspec
sudo gem install vidibus-textile-GEMVERSION.gem

(2) Run the script:

./strip-textile.rb <input >output