Ruby 2.2.0 Byte Encoding Issue

Issue #197 duplicate
Jason Waldrip created an issue

I am seeing the following error on the new ruby 2.2.0. I haven't seen this with previous versions.

ActiveRecord::StatementInvalid:
       PG::CharacterNotInRepertoire: ERROR:  invalid byte sequence for encoding "UTF8": 0x84
       : INSERT INTO "users" ("created_at", "email", "first_name", "last_name", "onboarded", "password_digest", "remember_token", "updated_at", "user_key") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9) RETURNING "id"

Comments (11)

  1. Michael Granger repo owner

    Thanks for the report!

    Does this happen for every query, or if not, do you have data we can use to reproduce it?

  2. Jason Waldrip reporter

    Using the pre-release has seemed to fix the issue. Any idea on when it will be a stable release?

  3. Lars Kanis

    @Michael Granger what do you think about a stable release? I do have two small issues with it.

    The first is this: https://groups.google.com/d/msg/ruby-pg/kXRjo6cGD30/CeyJsr9CZ2QJ Fortunately this should be solved with the commit I just pushed: 592e29c .

    The other issue is this: https://github.com/rails/rails/pull/17680 The rails folks didn't react on this - maybe my wording was not the best. An alternative to fixing rails would be to work around this rails bug in ruby-pg or to just release pg and wait...

  4. Michael Granger repo owner

    Yeah, I'm excited to get 0.18 out, so let's try to get it done by the new year. I've been using the pre-releases in my own company's app with no problems whatsoever.

    Since the Rails issue is a month old, and should really only affect people using bytea fields (if I'm understanding it correctly), I say let's ping them one more time and release with a note describing the issue.

  5. Ed Ropple

    I'm seeing something similar to this with MRI 2.3.0 and 0.18.4.

    /usr/local/rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/sequel-4.29.0/lib/sequel/adapters/postgres.rb:184:in `async_exec': PG::CharacterNotInRepertoire: ERROR: invalid byte sequence for encoding "UTF8": 0x89 (Sequel::DatabaseError) from /usr/local/rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/sequel-4.29.0/lib/sequel/adapters/postgres.rb:184:in `block in execute_query'

    Seems to be happening consistently across a wide set of queries.

  6. Lars Kanis

    The description of this issue is not very detailed, so that it doesn't show the root cause. To that time there was a specific issue with ruby-2.2, which was solved in 2014. The remaining issues are all plain encoding issues with the strings sent to the database.

    If the database connection is configured for UTF-8 character encoding (which is the default), then all strings (SQL commands, parameters, etc.) are expected to be encoded as valid UTF-8. If this is not the case, then the PostgreSQL server will complain about it, as posted above.

    Please note: If you get strings in a different encoding (say iso-8859-x from some file or external source), they must be converted to UTF-8 first, in order to be send to the database. This will probably change with pg-0.19.0 which will do some conversions automatically: https://github.com/ged/ruby-pg/pull/11

  7. Log in to comment