Uppercase word is not found in search when html_search_language='ja'.

tomo saito avatartomo saito created an issue

I found that the word containing a capital letter was not found in search, when a language setting was Japanese.

It is because the word is not made into the small letter at index creation when setting is Japanese. Although the word of the index contains the capital letter, since a query is made into a small letter, a query will not hit it.

Although lower function is always executed in a SearchEnglish#stem method(in search/en.py), it is not executed in a SearchJapanese#stem method(in search/ja.py).

In search, all the queries are carried out in a small letter. (searchtools.js line 160)

   var word = stemmer.stemWord(tmp[i]).toLowerCase();

I think that this problem will be solved if the following codes are added.

diff -r 7c437d4e4f10 sphinx/search/ja.py
--- a/sphinx/search/ja.py   Sun Feb 10 16:39:44 2013 +0400
+++ b/sphinx/search/ja.py   Sun Feb 17 21:43:09 2013 +0900
@@ -271,3 +271,6 @@

     def word_filter(self, stemmed_word):
         return len(stemmed_word) > 1
+
+    def stem(self, word):
+        return word.lower()

Conversion of python lower and javascript toLowerCase is not completely the same. However, they convert the half-width alphabet and the full-width alphabet into a small letter among the letters often used in Japan. I think above patch is not perfect, but satisfactory solution actually.

I attach a example project. Attachments contain html built in English and Japanese. search_ja_fix is built by sphinx which applied the above patch. Each of these contains the word "FooBar", but It is only search_en and search_ja_fix that can search this.

Comments (3)

  1. Log in to comment
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.