Using the log file to investigate indexing issues
All Recoll processes print trace messages. By default these go to the standard error output, and you may not ever see them (in the case, for example, of the
recoll GUI started from the desktop interface).
There are a number of potential issues with indexing that may need investigation, such as:
- A file can't be found by searching even if it appears that it should have be indexed (this could happen because the file is not selected at all or because a filter program crashes).
- The indexing process gets stuck and never finishes.
- The indexing process ends up with an error.
- The indexing process seems to be using too much system capacity.
The right way to approach these problems is to use the
recollindex command line tool (instead of the
recoll GUI), and to set up the trace log to provide information about what indexing is actually doing.
Trace log parameters can be set either from the GUI Preferences->Indexing Configuration->Global Parameters panel, or by editing the configuration file
~/.recoll/recoll.conf. You should set the following parameters:
loglevel = 6 logfilename = stderr thrQSizes = -1 -1 -1
We use stderr instead of an actual file in order to capture direct filter messages (such as a
python stack trace) along with normal
The last line sets recollindex for single-threaded operation, which will make the log much more readable.
You should then check that no
recollindex process is currently running, and kill any you find.
Then, if this is an issue about an identified file, try indexing it only:
recollindex -i myunfindablefile.xxx > /tmp/myindexlog 2>&1
If this is a general issue with indexing (process not finishing properly), just start it:
recollindex > /tmp/myindexlog 2>&1
Usually, having a look at the trace will allow to see what is wrong (ie: a configuration issue or missing filter), and solve the problem.
In case of indexer misbehaviour (e.g. using too much memory, you should run _tail -f_ on the log to see what is going on.
If this is not enough, please open a tracker issue and attach or link to the log data, or just email me (jfd at recoll.org).
recollindex -i usually have the same criteria to include a file or not (but see the Path gotcha note below). It may happen that they behave differently, so it may sometimes be useful to run a full
recollindex even for a specific file, but this will produce a big log file.
When you are done, it is better to reset the verbosity to a reasonable level (ie: 2: just errors, 4: basic traces).
Note: the path gotcha
recollindex -i will only index files under the directories defined by the
topdirs configuration variable (your home directory by default). Unfortunately, the test is done on the file path text, ignoring possible symbolic links. If you give a simple file name as a parameter to
recollindex -i and there are symbolic links inside the
topdirs entries, the comparison may fail. For example, if your home directory is /home/me and /home is a link to /usr/home,
recollindex -i somefilename will actually try to index /usr/home/somefilename, and fail (because /usr/home/me is not a subdirectory of /home/me). This will manifest itself in the log by a message like the following.
:4:../index/fsindexer.cpp:149:FsIndexer::indexFiles: skipping [/usr/home/me/somefile] (ntd)
If this happens, give a full path consistent with what is found in the configuration file (ie:
recollindex -i /home/me/somefile.
File system occupation
One of the possible reasons for failed indexing is a
maxfsoccup parameter set too low. This is the value of file system occupation, not free space, where indexing will stop. It is set from the GUI indexing configuration or by editing recoll.conf. A value of 0 implies no checking, but a very low, non-zero, value will just prevent indexing.