# progressive

Author: Grzegorz Chrupała 2013-11-11 0.1.0.1

Progressive is a multilabel classification model which learns sequentially (online). The set of labels need not be known in advance: the learner keeps a constantly updated set of top N most frequent labels seen so far and predicts labels from this set.

## Installation

The package provides the executable progressive. You need the Vowpal Wabbit machine learning toolkit to use progressive. You can compile and install it from source. On Ubuntu or Debian, you can install the vowpal-wabbit package. Either way, make sure you have the Vowpal Wabbit executable (called vw) installed somewhere in your path.

To compile progressive you should first install the Haskell Platform. Once you have it simply do the following:

cabal update
cabal install --bindir=DIRECTORY


Replace DIRECTORY with the directory where you want to install the executable. Make sure progressive is in your PATH.

## Usage

progressive can run in a learning mode which interleaves learning and prediction:

progressive --size SIZE-IN-BITS --max-labels NUMBER-OF-LABELS MODEL-PATH


which runs in learning mode, with model size set to SIZE bits, and saves the model to MODEL-PATH. Progessive mode can also run in in pure prediction mode, where it simply uses a previously learned model to predict labels for new data:

progressive --no-learn MODEL-PATH


uses the model in MODEL-PATH to predict new labels, and does not preform any learning.

For optimum results, use the maximum size of the model allowed: 29 bits. You may need to set it to a lower value if you don't have enough RAM.

## Input format

Each training example fits on one line, and consists of a number of space-separated fields. The first field contains a comma-separated list of labels. The rest if the fields contain features. A feature is either a string (excluding spaces and colons), or a string followed by a colon, followed by a number. If the feature is just a string, its value is implicitly 1.0. Example:

fun,wierd,misc something this:3 or that:2.0