SpanishStemmer raises IndexError: string index out of range

Create issue
Issue #493 new
Stephane Boisson created an issue

Some words cause an exception.

Not sure if issues is in original algorithm or in python implementation.

Example reproducing the issue using Whoosh 2.7.4:

# -*- coding: utf-8 -*-
from whoosh.lang.snowball.spanish import SpanishStemmer

stemmer = SpanishStemmer()
print stemmer.stem(u"B\xe8gue")

Results:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/sboisson/Documents/venv/lib/python2.7/site-packages/whoosh/lang/snowball/spanish.py", line 239, in stem
    if len(word) >= 2 and word[-2:] == "gu" and rv[-1] == "u":
IndexError: string index out of range

Comments (4)

  1. lara brian

    In Python, a string is a single-dimensional array of characters. The string index out of range means that the index you are trying to access does not exist. In a string, that means you're trying to get a character from the string at a given point. If that given point does not exist , then you will be trying to get a character that is not inside of the string. Indexes in Python programming start at 0. This means that the maximum index for any string will always be length-1. There are several ways to account for this. Knowing the length of your string (using len() function)could certainly help you to avoid going over the index.

  2. Log in to comment