1. medoc
  2. recoll
  3. Issues
Issue #148 on hold

A gui-initiated search containing common words fails.

David Koppelman
created an issue

A search using the GUI containing common words such as "arm" or "space" turn up zero matches. Searching for those words using the command line succeeds. The GUI term explorer can also find those words. The advanced search can be used to find the words in an abstract, but not elsewhere in the document.

I'm using Recoll 1.19.4 and Xapian 1.2.7.

Comments (14)

  1. medoc repo owner

    Hi,

    Could you please post the query that recoll performs in both cases ? This is the first line printed from the command line interface, and you can get it in the GUI by clicking the (show query) link (you can copy/paste the resulting text).

  2. David Koppelman reporter

    Here are the queries:

    Result count (est.): -1 Query details: (ARM:(wqf=11))

    Result count (est.): -1 Query details: ((space:(wqf=11) OR spacing OR Space OR spaces OR SPACE OR Spacings OR SPACES OR spaced OR Spacing OR Spaces OR spacings OR SPACING OR Spaced OR space's OR Space's OR SPACINGS OR SPACEs OR spacees OR space͒))

  3. medoc repo owner

    I guess that these are 2 different queries in the GUI ? Or the command line ? What I would need is to compare the query performed by the GUI and the query performed by the command line.

    Also, are you using multiple indexes by any chance ?

  4. David Koppelman reporter

    The queries that I posted above were both from the GUI.

    Here is the result of a command-line query, cut to just show the first result. Six were actually displayed, though from my understanding of the help text it should have displayed 200 results. Even a query such as "recoll -t -n 100 -q 'ARM'" shows just 6 results.

    [sky.ece.lsu.edu] % recoll -t -q 'ARM'
    Recoll query: (ARM:(wqf=11))
    568 results
    text/html   [file:///home/faculty/koppel/pub/sroot-off/info/as.info]    [as.info / Machine Dependencies / ARM-Dependent / ARM Syntax / ARM-Relocations] 1201    bytes
    

    As far as I know I'm using one index.

  5. medoc repo owner

    Hi,

    I would need to see the debug log for the GUI case.

    Setting up the log (at level 6) is described here: https://bitbucket.org/medoc/recoll/wiki/ProblemSolvingData in the "Obtaining information from the log file" paragraph.

    Please either send the log through email: jf@dockes.org, or attach it here. Don't try to include it in a comment, it usually does not work well. Actually, email might be the best approach, as there may be need for a few more exchanges if you bear with me...

    Thanks.

  6. David Koppelman reporter

    FWIW, I'm attaching a log file for a query that does work correctly. Given what is shown in the first log file I'm tempted to rebuild my index, but I won't until the cause of the flaw is found.

  7. medoc repo owner

    I have looked at the logs, and I have no idea about what causes the Xapian get_mset() exceptions. This might be a bug, but I think that the chance to find it are slight, and the most probable cause is a corrupted index.

    I think that the best thing to do would be to just delete the Xapian index directory ($HOME/.recoll/xapiandb by default) and reindex.

    Hopefully the problem will just go away, else we'll know that there is something to debug.

    This seems to be a case-sensitive index, did you rebuild it from scratch recently ?

  8. David Koppelman reporter

    This seems to be a case-sensitive index, did you rebuild it from scratch recently ?

    Yes, with a fresh Xapian directory. I would stop and start it during indexing to play with the parallelization parameters.

    I'll rebuild the index and post back whether it works. That might be on Monday, depending on how fast it reindexes.

  9. medoc repo owner

    I'm glad that it now works.

    The problem is going to be mentionned in the release notes, but as you are the first to report it, I'd have had to be quite a medium to do it earlier.

    Recoll is not Firefox or Libreoffice, I do as much testing as I can, but it does not go through as much beta-testing as these high-diffusion packages, and the probability for a given user to discover an original problem is higher, which you just experienced.

    Also, I still have no idea what happened. From what you wrote, I'd guess that there may be a suspicion that a multithreaded indexer can sometimes get in trouble when interrupted, but this is certainly not part of the design, and unexpected, I never saw it happen before this occurrence.

    Did you notice indexing speed improvement while experiencing with the multithreading ?

  10. David Koppelman reporter

    The problem is going to be mentionned in the release notes, but as you are the first to report it, I'd have had to be quite a medium to do it earlier.

    I didn't intend any criticism! I meant that it should be, not that it should have been!

    Recoll is not Firefox or Libreoffice, I do as much testing as I can, but it does not go through as much beta-testing as these high-diffusion packages, and the probability for a given user to discover an original problem is higher, which you just experienced.

    I appreciate the effort!

    Also, I still have no idea what happened. From what you wrote, I'd guess that there may be a suspicion that a multithreaded indexer can sometimes get in trouble when interrupted, but this is certainly not part of the design, and unexpected, I never saw it happen before this occurrence.

    I'll be alert to similar problems in the future. If I find one I'll open a new bug on it.

    Did you notice indexing speed improvement while experiencing with the multithreading ?

    It seemed to go faster, but I didn't do any actual measurement. There certainly was higher CPU utilization.

  11. Log in to comment