No search results on the subsequent pages of large PDF documents.

Issue #1256 resolved
Valerio W. created an issue

Hello Janos.

I have noticed that large PDF attachments are not fully indexed, or at least the search stops finding results after a certain number of pages (for example no search-results after page 120 of 300). The page number on which this occurs is not always the same and varies from document to document.

Do you know what could be the cause of this? I am currently using version 1.3.12 build 1001 of piler.

Thank you and best regards, Valerio

Comments (6)

  1. Janos SUTO repo owner

    The parser uses a buffer to store text from the body including the attachments, and it has a finite size ~132 kB. So if you have a pretty large pdf file with text, then only the first ~130 kB is indexed.

  2. Valerio W. reporter

    Thank you for your explanation. Are there reasons against increasing the buffer size? Or would it be conceivable to increase the buffer to e.g. 512kB?

  3. Janos SUTO repo owner

    I think 132 kB should be fine, however, you may increase it. To do that edit src/config.h and fix the settings for #define BIGBUFSIZE, then recompile piler.

  4. Log in to comment