PG::TextEncoder::Array doesn't properly handle UTF-8

Issue #230 invalid
Sean Griffin
created an issue

This has caused the issue https://github.com/rails/rails/issues/22730 in Rails.

Steps to reproduce:

require 'pg'

encoder = PG::TextEncoder::Array.new(name: "string[]", delimiter: ",")
decoder = PG::TextDecoder::Array.new(name: "string[]", delimiter: ",")
p ['nový'] == decoder.decode(encoder.encode(['nový']))

Comments (4)

  1. Lars Kanis

    Thanks for reporting this issue!

    Actually this is more a kind of feature than a bug. PG currently doesn't respect the character encoding of strings sent to the database. The encoding tag is simply ignored and the string is sent to the database in it's internal binary representation. Consequently PG::Coder#encode returns a string in ASCII-8BIT encoding.

    Your example is easy to fix:

    p ['nový'] == decoder.decode(encoder.encode(['nový']).encode!(__ENCODING__))
    

    This changes the character encoding to the encoding of the surrounding ruby file and so, to the encoding of the left 'nový' . Actually this is also how the previous rails array encoder worked: it encoded the array string in the character encoding of the surrounding ruby file (which is still not correct, if the connection encoding is different to UTF-8).

    May I open a pull request to rails, that fixes this?

    I've opened a new proposal for respecting the character encoding of strings sent to the database: #231 This would add the possibility to request a particular character encoding of strings returned by PG::Coder#encode .

  2. Log in to comment