Wiki
Clone wikiporter2_stemmer / Home
Porter2 Stemming Algorithm in C++
Background
This is an implementation of the Porter2 English stemming algorithm as detailed here. Usage is simple; compile porter2_stemmer.h and porter2_stemmer.cpp into your project, giving access to the Porter2Stemmer namespace and the function "stem".
Note that the namespace is written in C++11.
Motivation
In the original ANSI C stemmer, you would stem a C++ string like this:
string word = // ...
struct sb_stemmer* stemmer = sb_stemmer_new("english", NULL);
size_t length = word.size();
sb_symbol symb[length];
memcpy(symb, word.c_str(), length);
string stemmed = string((char*)sb_stemmer_stem(stemmer, symb, length));
sb_stemmer_delete(stemmer);
In the C++ version,
string word = // ...
Porter2Stemmer::stem(word);
Not only is the C++ version more concise, it doesn't result in any invalid reads when run under valgrind or any other memory error checking tool.
Updated