mysql_stmt_execute error: *Incorrect string value: '...' for column 'body' at row 1* (errno: 1366)
Piler works great, but only archives some of the incoming emails. The following message can be found in /var/log/mail.log frequently:
mysql_stmt_execute error: *Incorrect string value: '\xAD57 Di...' for column 'body' at row 1* (errno: 1366)
I think it's some kind of related to UTF-8 charset but I don't even know where to begin.
There are similar errors in #471 and #571, but both issues are almost one year old and I wonder if someone has even tested piler with MySQL 5.7
My System: Ubuntu 16.04.1 LTS, all updates installed mysqld Ver 5.7.13-0ubuntu0.16.04.2 for Linux on x86_64 ((Ubuntu)) Apache/2.4.18 (Ubuntu) PHP 7.0.8-0ubuntu0.16.04.2 (cli) ( NTS )
Piler version: Build Date: Thu Aug 4 14:39:06 UTC 2016 ldd version: ldd (Ubuntu GLIBC 2.23-0ubuntu3) 2.23 gcc version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.1) Configure command: ./configure --localstatedir=/var --with-database=mysql --enable-starttls --enable-tcpwrappers
The MTA is Zimbra 8.6
Comments (19)
-
repo owner -
reporter After a bit of research I found out that this happens because of the incomplete utf8-implementation of mysql, which can just handle 3-byte characters. Thanks to another guy struggling with the same problems I tried converting the whole database to utf8mb4 which seemed to work in my case. piler now accepts the messages it refused to process before. I'll keep an eye on these error messages...
I had to alter the unique key of the retention_rule table because of length restrictions:
UNIQUE KEY `domain` (`domain`(120),`from`,`to`,`subject`,`_size`,`size`,`attachment_name`,`attachment_type`,`_attachment_size`,`attachment_size`,`spam`)
Additional links:
-
Oh, shit!
And why this is working in previous versions of MySQL? Do you have the change to be done in piler database (which table, which column, which ALTER sentence?
-
repo owner I still need such a message to test with mysql 5.7. As with any software, mysql also changes from version to version. Also it's worth to check with mariadb 10.x or even with percona mysql as well.
-
Hi,
I'll try to send you a problematic message. For the moment, I have solved following the instrucitons from the previous link. Exactly, for this incident, I have executed:
ALTER DATABASE piler CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci; ALTER TABLE piler.sph_index CONVERT TO CHARACTER SET utf8mb4, COLLATE = utf8mb4_unicode_ci; ALTER TABLE piler.sph_index CHANGE body body TEXT CHARACTER SET utf8mb4, COLLATE = utf8mb4_unicode_ci;
-
Hi, Janos
Try to archive this message in MySQL 5.7. I contains Emojis:
πππ€ππ½πΎ
-
repo owner I need the message itself to test with. Can you send it to my address?
-
Hi,
I have sent a sample message to your address.
-
repo owner Thanks, I'll test it with 5.7. I've just tried it with mariadb 10.1, and it just got to the archive without any problem.
-
Hi,
Looking forward your comments. I'm totally lost with this, because MariaDB manage UTF in the same way than MySQL (utf8 uses until 3 bytes and doesn't support emojis, utf8mb4 uses until 4 bytes, and can store emojis). By the way, have you checked how is the body column in sphinx_index table previous to index? Had it the emoji values?
As commented, looking forward your comments (I'm going to reload again everything, but I want to have this clarified)
-
And, I don't know if it's relevant, but during compilations I have these 2 warnings (see utf8_encode_char)
../src/hash.h:18:12: warning: inline function βhashβ declared but never defined inline int hash(unsigned int key); ^ In file included from ../src/piler.h:12:0, from smtp.c:20: ../src/decoder.h:16:13: warning: inline function βutf8_encode_charβ declared but never defined inline void utf8_encode_char(unsigned char c, unsigned char buf, int buflen, int len);
-
repo owner Emojis don't need to be indexed (and put to the sph_index table), since nobody searches for :-) or those other funky stuff.
-
Agree!!
But the problem is not that. My incident is that those emails are not indexed at all, because the insert into sphinx_index.body fails!
-
repo owner Yes, I'm aware of the problem, and gonna fix it. Btw. it seems that gcc 5.x is a bit picky, you may get rid of the warning by removing the inline definitions from hash.h and decoder.h
-
Ok, the inline warning doesn't worry me. It's only if they were a clue about the charset failure in MySQL 5.7
-
repo owner Well, by reading what others also experienced, I believe that it's a mysql issue. You have several options, though:
- apply the sql schema fix you mentioned
- downgrade to mysql 5.5
- switch to mariadb. 10.1.14 (and probably newer versions as well) are known to work.
I'd recommend switching to mariadb, but it's your call.
-
Hi
I'm doing a full reinstall, changing the full database charset to utf8mb4, and the sphinx.conf accordly (SET NAMES to utf8mb4). I will report here my progress. In this moment, my problems are with the unique indexes in the *_rules tables. But, as I guess, this index are only to avoid duplicated rules, so I have removed them.
-
repo owner That's one possible workaround. The other is to fix only the sph_index table.
-
repo owner - changed status to resolved
Added the possible workarounds to the FAQ.
- Log in to comment
OK, I'll test it with mysql 5.7. Can you provide such a problematic message you can't import with 5.7?