Wiki

Clone wiki

porter2_stemmer / Home

Porter2 Stemming Algorithm in C++

Background

This is an implementation of the Porter2 English stemming algorithm as detailed here. Usage is simple; compile porter2_stemmer.h and porter2_stemmer.cpp into your project, giving access to the Porter2Stemmer namespace and the function "stem".

Note that the namespace is written in C++11.

Motivation

In the original ANSI C stemmer, you would stem a C++ string like this:

string word = // ...
struct sb_stemmer* stemmer = sb_stemmer_new("english", NULL);
size_t length = word.size();
sb_symbol symb[length];
memcpy(symb, word.c_str(), length);
string stemmed = string((char*)sb_stemmer_stem(stemmer, symb, length));
sb_stemmer_delete(stemmer);

In the C++ version,

string word = // ...
Porter2Stemmer::stem(word);

Not only is the C++ version more concise, it doesn't result in any invalid reads when run under valgrind or any other memory error checking tool.

Updated