Audit `open` calls for py2/py3-compatibility

Issue #21 resolved
Min RK created an issue

When reading/writing from files, Python 2 and 3 differ in their default behavior. Both handle str by default, but Python 2 doesn't handle unicode well.

Python 2.7 has io in the standard library, which implements Python 3's unicode-aware behavior. io.open only talks unicode (Python 3 str) unless bytes-mode (e.g. rb) is given explicitly.

  • reading/writing bytes: use io.open(file, 'rb') (or wb)
  • reading/writing text: io.open(file, 'r', encoding='utf8') (encoding is required, unlike Python 3 builtin open which uses 'utf8' by default)

The most common error I've seen when adopting io.open is that it will error pedantically if you pass it str on Python 2, so you need to make sure that you decode str->unicode on py2 (more explicit, probably best) OR open files with b if bytes are given (more permissive, if you want to minimize errors).

Comments (3)

  1. Min RK reporter

    I should also clarify: it's fine to use builtin open with rb / wb, it's just the native-str r/w modes that cause trouble.

  2. Log in to comment