I have found that the English Snowball stemmer tries to return special words like 'news', 'atlas', etc. untouched. However instead of returning the original word, it returns the relevant special words list entry which is a <str> instead of the original (presumably <unicode>) word.
Original code in whoosh/lang/snowball/english.py:
if word in self.__special_words: return self.__special_words[word]
Should be changed to:
if word in self.__special_words: return word
Found this by trying to combine the LanguageAnalyzer with a CharsetFilter. The latter will generate errors for special words, because it is expecting <unicode> instead of <str> text:
TypeError: expected a string or other character buffer object
Temporarily worked around this issue by putting a custom UnicodeFilter in between LanguageAnalyzer and CharsetFilter so that words are forced back to <unicode> before being passed to the CharsetFilter.