Remove and in authorfield in lucene
The word "and" in the author field (used as delim) is currently indexed in lucene, causing a search for "and" to yield almost all posts. Please remove the delimiter from the author field
Comments (7)
-
Account Deleted -
reporter Not knowing about the specifics of DiacriticsLowerCaseFilteringAnalyzer: Please make sure, that no parts of the Names are being removed. Instead, only the delimiter and should be removed. Thus an author named Zu should not be removed because "zu" is a german stopword.
-
Account Deleted Commented by telekoma: That would be the case, BUT only when you do a full text search. We want to use the stop words in the full text search only since it often returns a great amount of results. The specific search for an author via e.g. http://www.bibsonomy.org/author/zu won't be affected. Indeed it would be some kind of compromise
-
not fixed; needs to be discussed
-
- changed status to open
-
- changed status to invalid
migrated search to elasticsearch => changed post representation in lucene/elasticsearch
-
-
assigned issue to
- edited description
- changed milestone to 3.7.0
- changed component to search
-
assigned issue to
- Log in to comment
Commented by telekoma: I changed the DiacriticsLowerCaseFilteringAnalyzer, which is responsible for indexing the data for the full text search, to use a multilanguage stop word list