Attachment search issue
Failed to search in attachment body using mailpiler.Any suggestion would be great.
Comments (31)
-
repo owner -
reporter Yes we are using piler-v.Thanks for quick response.
-
reporter I am trying to index and search pdf,word,powerpoint,excel,text and zip file.
-
repo owner I meant, please show me the output of the "piler -v" command to see which version exactly you are running.
Please also show me the contents of piler-config.h file.
-
reporter Content of piler-config.h file /*
- config.h, SJ
- /
- ifndef _CONFIG_H
- define _CONFIG_H
- include <syslog.h>
- include "piler-config.h"
- include "params.h"
- define PROGNAME "piler"
- define VERSION "0.1.21"
- define BUILD 705
- define HOSTID "mailarchiver"
- define CONFIG_FILE CONFDIR "/piler.conf"
- define WORK_DIR DATADIR "/piler/tmp"
- define QUEUE_DIR DATADIR "/piler/store"
- define CLAMD_SOCKET "/tmp/clamd"
- define PIDFILE "/var/run/piler/piler.pid"
- define QUARANTINELEN 255
- define TIMEOUT 60
- define TIMEOUT_USEC 500000
- define SESSION_TIMEOUT 420
- define MAXBUFSIZE 8192
- define SMALLBUFSIZE 512
- define BIGBUFSIZE 131072
- define REALLYBIGBUFSIZE 524288
- define TINYBUFSIZE 128
- define MAXVAL 256
- define RANDOM_POOL "/dev/urandom"
- define RND_STR_LEN 36
- define BUFLEN 32
- define IPLEN 16+1
- define KEYLEN 56
- define CRLF "\n"
- define MEMCACHED_CLAPF_PREFIX "_piler:"
- define MAX_MEMCACHED_KEY_LEN 250
- define MEMCACHED_SUCCESS 0
- define MEMCACHED_FAILURE 1
- define MEMCACHED_COUNTERS_LAST_UPDATE MEMCACHED_CLAPF_PREFIX "counters_last_update"
- define MEMCACHED_MSGS_RCVD MEMCACHED_CLAPF_PREFIX "rcvd"
- define MEMCACHED_MSGS_VIRUS MEMCACHED_CLAPF_PREFIX "virus"
- define MEMCACHED_MSGS_DUPLICATE MEMCACHED_CLAPF_PREFIX "duplicate"
- define MEMCACHED_MSGS_IGNORE MEMCACHED_CLAPF_PREFIX "ignore"
- define MEMCACHED_MSGS_SIZE MEMCACHED_CLAPF_PREFIX "size"
- define LOG_PRIORITY LOG_INFO
- define _LOG_INFO 3
- define _LOG_DEBUG 5
- define MAX_RCPT_TO 128
- define MIN_WORD_LEN 3
- define MAX_WORD_LEN 25
- define MAX_TOKEN_LEN 4*MAX_WORD_LEN
- define DELIMITER ' '
- define BOUNDARY_LEN 255
- define MAX_ATTACHMENTS 16
- define MAX_ZIP_RECURSION_LEVEL 2
/* SQL stuff */
- define SQL_SPHINX_TABLE "sph_index"
- define SQL_METADATA_TABLE "metadata"
- define SQL_ATTACHMENT_TABLE "attachment"
- define SQL_FOLDER_TABLE "folder"
- define SQL_RECIPIENT_TABLE "rcpt"
- define SQL_ARCHIVING_RULE_TABLE "archiving_rule"
- define SQL_RETENTION_RULE_TABLE "retention_rule"
- define SQL_COUNTER_TABLE "counter"
- define SQL_OPTION_TABLE "option"
- define SQL_MESSAGES_VIEW "v_messages"
- define SQL_ATTACHMENTS_VIEW "v_attachment"
/* Error codes */
- define OK 0
- define ERR 1
- define ERR_EXISTS 2
- define AVIR_OK 0
- define AVIR_VIRUS 1
- define DIRECTION_INCOMING 0
- define DIRECTION_INTERNAL 1
- define DIRECTION_OUTGOING 2
- define DIRECTION_INTERNAL_AND_OUTGOING 3
- define WRITE_TO_STDOUT 0
- define WRITE_TO_BUFFER 1
- endif /* _CONFIG_H */
-
reporter Output of piler-V command: piler 0.1.21, build 705, Janos SUTO <sj@acts.hu>
Build Date: Mon Sep 17 15:10:26 CEST 2012 Configure command: ./configure --localstatedir=/var
-
repo owner OK, thanks for the version, howeve you pasted $sourcedir/src/config.h and not $sourcedir/piler-config.h. In order to support attachment indexing, you need a few utilities, like pdftotext, catdoc, libzip, and unrtf. if you show me piler-config.h, then I can tell you whether piler knows about these or not.
-
reporter - define CONFDIR "/usr/local/etc"
- define DATADIR "/usr/local/var"
- define KEYFILE CONFDIR "/piler.key"
- define HAVE_DAEMON 1
- undef HAVE_PDFTOTEXT
- undef HAVE_CATDOC
- undef HAVE_CATPPT
- undef HAVE_XLS2CSV
- undef HAVE_UNRTF
- undef HAVE_ZIP
-
reporter Do i need to change it like "undef HAVE_PDFTOTEXT" to "define HAVE_PDFTOTEXT 1" ??
-
repo owner No, don't change it by hand, but rather install libzip, pdftotext, catdoc and unrar utilities, then run configure again, and recompile, then reinstall the binaries.
-
reporter We install libzip,pdftotext,catdoc and unrar.We download the vmware image from your website.
-
repo owner OK, let me know if it works for you
-
reporter No it doesnt work for us.
-
repo owner Ok, so you have installed these utilities, recompiled and reinstalled piler. Then please take an EML format message that have a pdf file, and run:
./src/pilertest the-message.eml
and it should display the contents of the pdf file, too.
-
reporter Yes it display the contents of the pdf file.Then i import that mail.Reindex it.But when i do search like "attchment:pdf,abody:GB2127264" then it pulls all the mails.Can you please guide me where i am wrong?
-
reporter If i do search like "GB2127264" then it works absolutly fine.Thank you so much:)))
-
repo owner - changed status to resolved
-
repo owner You are welcome ;-)
-
Hello.
I have problems searching inside attachments.
My version: 1.2.0 build 952
piler-config.h contents:
#define HAVE_PDFTOTEXT "/usr/bin/pdftotext" #define HAVE_CATDOC "/usr/bin/catdoc" #define HAVE_CATPPT "/usr/bin/catppt" #define HAVE_XLS2CSV "/usr/bin/xls2csv" #define HAVE_PPTHTML "/usr/bin/ppthtml" #define HAVE_UNRTF "/usr/bin/unrtf" #define HAVE_TNEF "/usr/bin/tnef" #define HAVE_ZIP 1 #define HAVE_LIBWRAP 1
Running pilertest against an eml file containing a pdf I can see the text inside PDF file. But When a search in webui (abody:sometext) any result is displayed.
-
repo owner Try omitting 'abody', just type sometext.
-
No. Not works ):
I'm using Debian 8.7
Apache error log show me:
PHP Notice: Undefined variable: text_download_selected_hits_as_pdf in /var/www/html/piler/view/theme/default/templates/search/helper.tpl on line 157, referer: https://mailpiler.mydomain.com/search.php
-
repo owner Then the email is not indexed yet or the searched text doesn't exist.
-
The text exists. I can see it when I run pilertest against the eml message file.
The e-mail was archived 7 days ago and I can find it in a search using "from" or "to".
-
repo owner OK, then I need to see the pilerget output and your search query.
-
pilerget [piler_id of the PDF attachment]
Shows me entire message, like I see in the eml file.
PDF file contain the string "P1911701" inside. My search is for that string.
abody:P1911701
or just
P1911701
OBS:
I have two attachments in message: PNG and PDF files.
Executing pilerget separately:
pilerget 400000005880cf730beac60c0055af6e0e7e 1
zpipe: invalid or incomplete deflate data
pilerget 400000005880cf730beac60c0055af6e0e7e 2
zpipe: invalid or incomplete deflate data
-
repo owner "Shows me entire message, like I see in the eml file."
Yes, that's what I need. Btw. why do you keep using 'abody'? It's an invalid keyword.
-
I see "abody" here, in this page.
"Dimple Mehta: Yes it display the contents of the pdf file.Then i import that mail.Reindex it.But when i do search like "attchment:pdf,abody:GB2127264" then it pulls all the mails.Can you please guide me where i am wrong? - 2012-09-24"
OK. abody is invalid. I understood. But even if I do not use "abody", searching only for the string, the result do not match the message and its attachs.
When I run pilerget for attachs separately I get an error:
pilerget 400000005880cf730beac60c0055af6e0e7e 1 zpipe: invalid or incomplete deflate data
pilerget 400000005880cf730beac60c0055af6e0e7e 2 zpipe: invalid or incomplete deflate data
Is that a problem?
-
repo owner Yes, that's a problem, because pileraget is used to get the attachment data separately. Anyway, getting the attachment returns the base64 encoded stuff only. What I need is, however, let me quote myself, "I need to see the pilerget output and your search query". I can't proceed without it.
-
I have the same.
pilerget Output:
locale: de_DE.UTF-8 build: 952 parsing... post parsing... message-id: <1461147002.12.1485514214565@my.ser.ver.com> / e75c5333a49e4170b6bf84afc7b25291cfe7e515a375bbf93398ac3c31769d31 from: *test2 betauser test2@domain.com test2 domain com (domain.com)* to: *test2 betauser test2@domain.com test2 domain com (domain.com )* reference: ** subject: *Mail with pdf* body: *here is a nice text Textfrompdf * sent: 1485514214, delivered-date: 0 hdr len: 771 body digest: d88cf0733106a22bd94af6ec54af5d43474576b314391b1fe7a9657e9b75b2cc rules check: (null) folder: 0 retention period: 1706445385 i:1, name=*test.pdf*, type: *application/pdf*, size: 15834, int.name: test.eml.a1, digest: c91e9c448f4b8412af378136d9fdecf0d973e8ed2745f3fe274cae41274e98d7 attachments:pdf, direction: 0 spam: 0
and my search query:
sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE MATCH('@(subject,body) Textfrompdf') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.00 s, 0 hits, 0 total found
And yes, I have created a PDF with the text "Textfrompdf" ;-)
And my piler-config.h
/* piler-config.h. Generated from piler-config.h.in by configure. */ /* * piler-config.h.in, SJ */ #define CONFDIR "/etc" #define DATADIR "/var" #define DATAROOTDIR "/usr/local/share" #define KEYFILE CONFDIR "/piler/piler.key" #define LICENCE_SIGNATURE_FILE CONFDIR "/piler/piler.lic" #define MESSAGE_ID_DEDUP_FILE DATAROOTDIR "/piler/deduphelper" #define HAVE_DAEMON 1 #define TIMEOUT_BINARY "/usr/bin/timeout" #define HAVE_PDFTOTEXT "/usr/bin/pdftotext" #define HAVE_CATDOC "/usr/bin/catdoc" #define HAVE_CATPPT "/usr/bin/catppt" #define HAVE_XLS2CSV "/usr/bin/xls2csv" /* #undef HAVE_PPTHTML */ #define HAVE_UNRTF "/usr/bin/unrtf" #define HAVE_TNEF "/usr/bin/tnef" /* #undef HAVE_ZIP */ #define HAVE_LIBWRAP 1 /* #undef HAVE_TWEAK_SENT_TIME */ /* #undef HAVE_SUPPORT_FOR_COMPAT_STORAGE_LAYOUT */
-
repo owner Dear Lord! Please help me with putting this to a file clicking on More -> Attach file, and remove the email from the comment itself. Anyway thanks for the data, I'll try to reproduce your search.
-
- attached message.txt
The search string is "takaoka".
#! sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE MATCH('@(subject,body) takaoka') ORD ER BY `sent` DESC LIMIT 0,100 OPTION max_matches=5000' in 0.00 s, 0 hits, 0 total found
- Log in to comment
what version do you use (piler -v)? What kind of attachment do you have to index and search?