PG::TextEncoder::Array wrong encode for UTF-8 text arrays

Issue #252 duplicate
Ilia Shavrin created an issue

Hi,

I've noticed wrong serialization result for UTF-8 string arrays. It could lead to incorrect behavior in other sides.

2.3.1 :009 > PG::TextEncoder::Array.new(name: "text[]", delimiter: ',').encode(["a", "š"] )
=> "{a,\xC5\xA1}"

#!/usr/bin/env ruby

require 'pg'
Encoding.default_external = Encoding::UTF_8

decoder = PG::TextDecoder::Array.new(name: "text[]", delimiter: ',')
encoder = PG::TextEncoder::Array.new(name: "text[]", delimiter: ',')

arr = ["a", "š"]
arr2 = decoder.decode encoder.encode(arr)

p "Should be equal: #{arr == arr2}"

Comments (3)

  1. Lars Kanis

    Ruby strings store their encoding specific to each single string object. On the other hand, the client encoding is per connection. Per default PG::TextEncoder::Array#encode encodes strings to their binary representation as Encoding::BINARY:

    decoder.decode encoder.encode(["ä".encode("ibm850"), "ä".encode("iso-8859-1")])  # => ["\x84", "\xE4"]
    

    Alternatively you can request a specific output encoding (typically the client encoding of the connection) as second argument to #encode . All input strings will be converted accordingly:

    decoder.decode encoder.encode(["ä".encode("ibm850"), "ä".encode("iso-8859-1")], Encoding::UTF_8) # => ["ä", "ä"] 
    

    I hope this helps.

  2. Log in to comment