1. Matt Chaput
  2. stemming
  3. Issues
Issue #3 invalid

Porter2 Stemmer: "stayed" --> "stai"

created an issue

I think this is a bug in the porter2 stemmer, but the word "stayed" stems to "stai", with an "i" at the end of it.

I do not think this is the correct behavior, I think it should stem to "stay".

Comments (4)

  1. speedplane reporter

    Sorry... I take this back. It appears that this is how lucene stems the word "stayed" using the porter2 filter, not how your code does. Please close this.

  2. speedplane reporter

    FYI, I think I see the confusion. The original porter stemmer stems "stayed" to "stai". This stemming occurs both with your library and with the Lucene analyzers.

    In your implementation, the porter2 stemmer stems "stayed" to "stay". However, in the Lucene/ElasticSearch implementation, the porter2 stemmer also stems "stayed" to "stai".

  3. Matt Chaput repo owner

    Yes, stemming often does not produce an actual word (it's best thought of as a code) and sometimes mistakenly removes things that it shouldn't (e.g. render -> rend). Use with caution ;)

  4. Log in to comment