Issue #57 wontfix

Shortcuts to encode/decode between unicode and utf-8

Armin Rigo
created an issue

The following helpers would be nice to have:

ffi.encode_utf8(unicode_string) -> new_char*_cdata
ffi.encode_utf8(unicode_string, target_char*_cdata, maximum_length)
ffi.decode_utf8(char*_cdata, [length]) -> unicode_string

The idea is that UTF-8 is a bit special in that it is the most widely used encoding on new programs [citation needed]. The above gives a straight way to encode/decode without doing two copies of the data.

Comments (2)

  1. Armin Rigo reporter

    ffi.encode_utf8(u) = ffi.new("char *", u.encode("utf-8"))

    ffi.decode_utf8(c) = ffi.string(c).decode("utf-8")

    which are equivalences correct in both Python 2 and Python 3. So I think that the answer "not worth it" is good enough.

  2. Log in to comment