Issue #8 resolved

Concerns on encodings

created an issue

I see irc/ assumes all the packets are encoded in UTF-8.

But in reality, non-UTF-8 texts are around: privmsg's are truncated by server by bytes hence sometimes broken, and some servers and channels still use their own local encodings other than UTF-8.

So I think the library should have an option for non-UTF-8 modes.

Comments (3)

  1. Jason R. Coombs repo owner

    By default, the IRC library does attempt to decode all incoming streams as UTF-8, but I acknowledge that there are cases where decoding is undesirable or a custom decoding option is desirable. To support these cases, since irc 3.4.2, the ServerConnection class may be customized. The 'buffer_class' attribute on the ServerConnection determines what class is used for buffering lines from the input stream. By default it is DecodingLineBuffer, but may be re-assigned with another class, such as irc.client.LineBuffer, which does not decode the lines and passes them through as byte strings. The 'buffer_class' attribute may be assigned for all instances of ServerConnection by overriding the class attribute::

    irc.client.ServerConnection.buffer_class = irc.client.LineBuffer

    or it may be overridden on a per-instance basis (as long as it's overridden before the connection is established)::

    server = irc.client.IRC().server()
    server.buffer_class = irc.client.LineBuffer

    I've added a section to the README that documents these options.

    Does this interface provide the option you seek? If not, please re-open.

  2. puzzlet reporter

    Thank you for the reply. It helped me a lot, but I've come up with another problem, mainly because I'm using Python 3.

    The library has somewhat mixed uses between bytes and str, and when you convert bytes to str implicitly it would result "b'this'".

    We should explicitly choose what to use between two kinds of strings, and I would like to recommend bytes. For example, the channel names are allowed to contain almost any sequences of bytes as specified by RFC 1459, so bytes should be suitable. But when you do that, every line would become problematic:

    • In irc.client.is_channel(): string[0] in "#&+!"
    • In irc.client.ServerConnection.join(): "JOIN %s%s" % (channel, (key and (" " + key)))
    • NickMask(prefix) when a privmsg event has occured

    So I'm trying to convert all the internal strings to bytes on my fork, in a similar fashion I've done to irclib:

  3. Log in to comment