.xlsx attachment issue
Indexing and search works fine on .xls file but it fails for .xlsx file.
Comments (12)
-
repo owner -
reporter /* piler-config.h. Generated from piler-config.h.in by configure. */ /*
- piler-config.h.in, SJ
- /
- define CONFDIR "/usr/local/etc"
- define DATADIR "/var"
- define KEYFILE CONFDIR "/piler.key"
- define HAVE_DAEMON 1
- define HAVE_PDFTOTEXT "/usr/bin/pdftotext"
- define HAVE_CATDOC "/usr/bin/catdoc"
- define HAVE_CATPPT "/usr/bin/catppt"
- define HAVE_XLS2CSV "/usr/bin/xls2csv"
- define HAVE_UNRTF "/usr/bin/unrtf"
- define HAVE_ZIP 1
-
repo owner Ok, it looks good. Now please take a message having an xlsx attachment, and please run:
./src/pilertest /path/to/message.eml
and check it it displays the contents of the xlsx file, too.
-
reporter Its not showing me the right contents.It produce result like "123 124 125 126 5 2 0 6 7 8 0 9 10 2 0 11 12 8 0 13 14 8 0 15 16 2 0 17 18 1 0 19 20 1 0 21 22 8 4 23 24 2 0 25 26 2 0 27 28 2 0 29 30 3 0 31 32 2 0 33 34 1 0 35 36 3 0 37 38 1 4 39 40 3 0 41 42 1 0 43 44 3 4 45 46 3 0 47 48 2 4 49 50 3 0 51 52 2 4 53 54 2 0 55 56 1 0 57 58 1 0 59 60 3 0 61 62 1 0 63 64 3 4 65 66 3 0 67 68 1 69 70 71 1 0 72 73 1 4 74 75 8 4 76 77 1 0 78 79 2 4 80 81 2 0 82 83 8 0 84 85 1 0 86 87 1 0 88 89 8 69 90 91 3 0 9293 8 0 94 95 1 0 96 97 8 0 98 99 2 0 100 101 3 4 102 103 3 69 104 105 1 0 106 107 3 4 108 109 3 69 110 111 2 69 112 113 1 69 114 115 1 69 116 117 8 0 118 119 8 4 120 121 8 0 122 123 124 125 126 123 124 125 126"
-
reporter My .xlsx file contents data like "USER 2011-11-29T11:01:28.027"
-
repo owner please show me the contents of the zip(!) file (xlsx is basically a zip file), so please run:
unzip -l /path/to/file.xlsx
-
reporter Archive: test.xlsx Length Date Time Name --------- ---------- ----- ---- 1556 1980-01-01 00:00 [Content_Types].xml 588 1980-01-01 00:00 _rels/.rels 980 1980-01-01 00:00 xl/_rels/workbook.xml.rels 637 1980-01-01 00:00 xl/workbook.xml 7079 1980-01-01 00:00 xl/theme/theme1.xml 322 1980-01-01 00:00 xl/worksheets/_rels/sheet1.xml.rels 12580 1980-01-01 00:00 xl/worksheets/sheet2.xml 1082 1980-01-01 00:00 xl/worksheets/sheet3.xml 689 1980-01-01 00:00 xl/worksheets/sheet1.xml 3844 1980-01-01 00:00 xl/sharedStrings.xml 4602 1980-01-01 00:00 xl/styles.xml 865 1980-01-01 00:00 docProps/app.xml 7824 1980-01-01 00:00 xl/printerSettings/printerSettings1.bin 616 1980-01-01 00:00 docProps/core.xml --------- ------- 43264 14 files
-
reporter I am still struggling to do search on .xlsx file.Any help would be great.
-
repo owner Is it possible to send me that xlsx file? I'll treat it confidentially, and wipe it off after fixing the problem.
Btw. the extracting utility searches in the xl/worksheets/sheet* files, and perhaps it has a bug removing the xml stuff and providing the raw text.
-
reporter - attached test.xlsx
-
repo owner Please apply the following patch, recompile piler, then replace the binaries, and try again. I was searching in the wrong files...
--- a/src/extract.c +++ b/src/extract.c @@ -212,7 +212,7 @@ void extract_attachment_content(struct session_data *sdata, struct _state *state } if(strcmp(type, "xlsx") == 0){ - extract_opendocument(sdata, state, filename, "xl/worksheets/sheet"); + extract_opendocument(sdata, state, filename, "xl/sharedStrings.xml"); return; }
-
repo owner - changed status to resolved
- Log in to comment
do you have libzip support compiled? Show me piler-config.h. You should have
#define HAVE_ZIP 1
and NOT
#undef HAVE_ZIP