Issue #372 resolved

A field with a list-type value crashed add_document+StemmingAnalyzer

Helmut Jarausch
created an issue

The following code

import re
from whoosh import fields, analysis, qparser
from whoosh.filedb.filestore import RamStorage
from whoosh.query.terms import Term, FuzzyTerm

def u(s) :
  return s

# schema = fields.Schema(Location=fields.STORED,Lang=fields.STORED,
#                        Title =fields.TEXT(spelling=True) )
schema = fields.Schema(Location=fields.STORED,Lang=fields.STORED,
                       Title =fields.TEXT(spelling=True,
                                          analyzer=analysis.StemmingAnalyzer()) )
ix = RamStorage().create_index(schema)

Fields= {'Location':'1000/123', 'Lang':'E', 
         'Title':['Introduction','Numerical','Analysis']}


with ix.writer() as w:
    w.add_document(**Fields)

dies with

File "Whoosh_ListLoadBug.py", line 27, in <module> w.add_document(**Fields)

File "/usr/lib64/python3.3/site-packages/whoosh/writing.py", line 759, in add_document for word in field.spellable_words(value):

File "/usr/lib64/python3.3/site-packages/whoosh/fields.py", line 296, in spellable_words in self.analyzer(value, no_morph=True)))

File "/usr/lib64/python3.3/site-packages/whoosh/fields.py", line 295, in <genexpr> wordset = sorted(set(token.text for token

File "/usr/lib64/python3.3/site-packages/whoosh/analysis/filters.py", line 296, in call for t in tokens:

File "/usr/lib64/python3.3/site-packages/whoosh/analysis/filters.py", line 220, in call for t in tokens:

File "/usr/lib64/python3.3/site-packages/whoosh/analysis/tokenizers.py", line 118, in call assert isinstance(value, text_type), "%s is not unicode" % repr(value)

AssertionError: ['Introduction', 'Numerical', 'Analysis'] is not unicode

I have successfully used the same code without StemmingAnalyzer in the TEXT field. Thanks for looking into it, Helmut

Comments (1)

  1. Log in to comment