Wrong string encoding for COPY TO STDOUT FORMAT CSV

Issue #259 wontfix
flop
created an issue

While trying to output some CSV using COPY TO STDOUT FORMAT CSV, I get a string with the wrong encoding. At first I thought it was coming from Sequel but Jeremy Evans found out the problem was in pg ( https://github.com/jeremyevans/sequel/issues/1325 ) :

Here is the script to help reproduce :

#!/usr/bin/env ruby

require 'pg'

conn = PG.connect( :dbname => 'template1' )
$stderr.puts '---',
    RUBY_DESCRIPTION,
    PG.version_string( true ),
    "Server version: #{conn.server_version}",
    "Client version: #{PG.respond_to?( :library_version ) ? PG.library_version : 'unknown'}",
    '---'

conn.exec("COPY (SELECT '\u2714' AS checked) TO STDOUT (FORMAT csv, ENCODING UTF8)")

while s = conn.get_copy_data
  p s.encoding
end

$stderr.puts %Q{Expected this to return: #<Encoding:UTF-8>}

and the output :

---
ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]
PG 0.20.0 (build 838985377b48)
Server version: 90502
Client version: 90502
---
#<Encoding:ASCII-8BIT>
Expected this to return: #<Encoding:UTF-8>

Comments (2)

  1. Lars Kanis

    Sorry for the late response! However this behavior is expected. The server just sends a binary blob to the client with no information about its meaning. This can be pure binary data or text data in UTF-8 or arbitrary other character encoding. Since we don't get informations about the meaning it's the responsibility of the user to make sure it is used properly.

    I clarified this fact in commit https://github.com/ged/ruby-pg/commit/a187d5c6459bc27bbb983cf7e761870a75341db7

  2. Log in to comment