mysql_stmt_execute error: *Incorrect string value: '...' for column 'body' at row 1* (errno: 1366)

Issue #709 resolved
seteq created an issue

Piler works great, but only archives some of the incoming emails. The following message can be found in /var/log/mail.log frequently:

mysql_stmt_execute error: *Incorrect string value: '\xAD57 Di...' for column 'body' at row 1* (errno: 1366)

I think it's some kind of related to UTF-8 charset but I don't even know where to begin. There are similar errors in #471 and #571, but both issues are almost one year old and I wonder if someone has even tested piler with MySQL 5.7

My System: Ubuntu 16.04.1 LTS, all updates installed mysqld Ver 5.7.13-0ubuntu0.16.04.2 for Linux on x86_64 ((Ubuntu)) Apache/2.4.18 (Ubuntu) PHP 7.0.8-0ubuntu0.16.04.2 (cli) ( NTS )

Piler version: Build Date: Thu Aug 4 14:39:06 UTC 2016 ldd version: ldd (Ubuntu GLIBC 2.23-0ubuntu3) 2.23 gcc version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.1) Configure command: ./configure --localstatedir=/var --with-database=mysql --enable-starttls --enable-tcpwrappers

The MTA is Zimbra 8.6

Comments (19)

  1. Janos SUTO repo owner

    OK, I'll test it with mysql 5.7. Can you provide such a problematic message you can't import with 5.7?

  2. seteq reporter

    After a bit of research I found out that this happens because of the incomplete utf8-implementation of mysql, which can just handle 3-byte characters. Thanks to another guy struggling with the same problems I tried converting the whole database to utf8mb4 which seemed to work in my case. piler now accepts the messages it refused to process before. I'll keep an eye on these error messages...

    I had to alter the unique key of the retention_rule table because of length restrictions:

    UNIQUE KEY `domain` (`domain`(120),`from`,`to`,`subject`,`_size`,`size`,`attachment_name`,`attachment_type`,`_attachment_size`,`attachment_size`,`spam`)
    

    Additional links:

    http://stackoverflow.com/a/24559308

    https://mathiasbynens.be/notes/mysql-utf8mb4

  3. Jose M. Albarran

    Oh, shit!

    And why this is working in previous versions of MySQL? Do you have the change to be done in piler database (which table, which column, which ALTER sentence?

  4. Janos SUTO repo owner

    I still need such a message to test with mysql 5.7. As with any software, mysql also changes from version to version. Also it's worth to check with mariadb 10.x or even with percona mysql as well.

  5. Jose M. Albarran

    Hi,

    I'll try to send you a problematic message. For the moment, I have solved following the instrucitons from the previous link. Exactly, for this incident, I have executed:

    ALTER DATABASE piler CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
    ALTER TABLE piler.sph_index CONVERT TO CHARACTER SET utf8mb4, COLLATE = utf8mb4_unicode_ci;
    ALTER TABLE piler.sph_index CHANGE body body TEXT CHARACTER SET utf8mb4, COLLATE = utf8mb4_unicode_ci;
    
  6. Jose M. Albarran

    Hi, Janos

    Try to archive this message in MySQL 5.7. I contains Emojis:

    πŸ˜€πŸ˜‡πŸ€—πŸ˜ˆπŸ‘½πŸ‘Ύ

  7. Janos SUTO repo owner

    Thanks, I'll test it with 5.7. I've just tried it with mariadb 10.1, and it just got to the archive without any problem.

  8. Jose M. Albarran

    Hi,

    Looking forward your comments. I'm totally lost with this, because MariaDB manage UTF in the same way than MySQL (utf8 uses until 3 bytes and doesn't support emojis, utf8mb4 uses until 4 bytes, and can store emojis). By the way, have you checked how is the body column in sphinx_index table previous to index? Had it the emoji values?

    As commented, looking forward your comments (I'm going to reload again everything, but I want to have this clarified)

  9. Jose M. Albarran

    And, I don't know if it's relevant, but during compilations I have these 2 warnings (see utf8_encode_char)

    ../src/hash.h:18:12: warning: inline function β€˜hash’ declared but never defined inline int hash(unsigned int key); ^ In file included from ../src/piler.h:12:0, from smtp.c:20: ../src/decoder.h:16:13: warning: inline function β€˜utf8_encode_char’ declared but never defined inline void utf8_encode_char(unsigned char c, unsigned char buf, int buflen, int len);

  10. Janos SUTO repo owner

    Emojis don't need to be indexed (and put to the sph_index table), since nobody searches for :-) or those other funky stuff.

  11. Jose M. Albarran

    Agree!!

    But the problem is not that. My incident is that those emails are not indexed at all, because the insert into sphinx_index.body fails!

  12. Janos SUTO repo owner

    Yes, I'm aware of the problem, and gonna fix it. Btw. it seems that gcc 5.x is a bit picky, you may get rid of the warning by removing the inline definitions from hash.h and decoder.h

  13. Jose M. Albarran

    Ok, the inline warning doesn't worry me. It's only if they were a clue about the charset failure in MySQL 5.7

  14. Janos SUTO repo owner

    Well, by reading what others also experienced, I believe that it's a mysql issue. You have several options, though:

    • apply the sql schema fix you mentioned
    • downgrade to mysql 5.5
    • switch to mariadb. 10.1.14 (and probably newer versions as well) are known to work.

    I'd recommend switching to mariadb, but it's your call.

  15. Jose M. Albarran

    Hi

    I'm doing a full reinstall, changing the full database charset to utf8mb4, and the sphinx.conf accordly (SET NAMES to utf8mb4). I will report here my progress. In this moment, my problems are with the unique indexes in the *_rules tables. But, as I guess, this index are only to avoid duplicated rules, so I have removed them.

  16. Log in to comment