1. eug48
  2. wiki2vcs

Overview

wiki2vcs

Loads the history of a Wikipedia article into a version control system so that the history can be quickly diffed and searched. There is no graphical interface yet so if you aren't comfortable with the command-line using it won't be very intuitive..

Currently, only Mercurial is supported. It'd be nice to support other version control system and wiki engines.

Warning

This tool is currently a standalone program that is to be downloaded and run on your computer. It can thus do nasty stuff, in theory leaving a [[trojan horse (computing)|trojan]], completely taking over your account, and stealing your data. Please ensure you check that the download can be trusted, e.g. that the wiki page hasn't been recently edited to change the download URL, which should be https://bitbucket.org/eug48/wiki2vcs/ ... (note the [[https]]).

Uses

  • As a fast offline 'blame' tool (once the initial load is complete..)
  • Find the revision where some suspect text was inserted - can then see what else that person put in, etc.
  • Find whether the article ever had some information. Was it was subsequently deleted? Why?
  • Go through recent changes much more quickly than if using the web interface.

Getting started

  • Install Python 2.7
  • Install Mercurial (you may want to install TortoiseHg which comes with a GUI)
  • Download wiki2vcs from https://bitbucket.org/eug48/wiki2vcs/downloads/wiki2vcs.zip
  • Open a command prompt (run cmd.exe on Windows)
  • Load an article's history, e.g. c:\python27\python.exe wiki2hg 'Myxogastria'
  • cd Myxogastria
  • hg grep --all corymbia
  • thg log

TODO

  • Support for Git and Bazaar
  • Using Mercurial's Command Server (may make things significantly faster on Windows)
  • Proper packaging
  • Making it faster and reducing the load on Wikimedia servers. An idea would be to process public dumps into separate files for each article and make them downloadable as a bootstrap - however the computation and storage would need to be paid for somehow.
  • A GUI, complete with an integrated diff viewer. I think QBzr has the nicest diff widget.
  • Browser integration, perhaps through custom protocols.

Helping with development

Credits

  • mwclient
  • Mediawiki
  • hgapi
  • Mercurial
  • Python
  • Vim
  • Bitbucket