Instantiating a freetype Font with a bytes path incorrectly decodes it

Issue #302 closed
Thomas Kluyver
created an issue

Ping @Lenard Lindstrom - this is related to but different from #196.

If pygame.freetype.Font is instantiated with a bytes path, which is common on Python 2, this line attempts to decode it with the raw_unicode_escape codec; that means that any \U subsequence will be treated as a unicode escape, and if it's not a valid one, it will get replaced with the unicode replacement character, U+FFFD.

>>> pygame.freetype.Font('C:\\Users\\...').path
u'C:\ufffdsers\\...'

As I mentioned on #107, this causes a test failure on Python 2, when the installation path includes \U:

======================================================================
FAIL: test_freetype_Font_path (pygame.tests.freetype_test.FreeTypeFontTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\Thomas\Miniconda3\envs\pygame-py2\lib\site-packages\pygame\test
s\freetype_test.py", line 1127, in test_freetype_Font_path
    self.assertEqual(self._TEST_FONTS['sans'].path, self._sans_path)
AssertionError: u'C:\ufffdsers\\Thomas\\Miniconda3\\envs\\pygame-py2\\lib\\site-
packages\\pygame\\tests\\fixtures\\fonts\\test_sans.ttf' != 'C:\\Users\\Thomas\\
Miniconda3\\envs\\pygame-py2\\lib\\site-packages\\pygame\\tests\\fixtures\\fonts
\\test_sans.ttf'

Resolving this isn't entirely simple, because on POSIX platforms paths are passed around as bytes, and on Windows paths are (preferably) passed around as unicode strings. But I'm pretty sure that decoding with 'raw_unicode_escape' is unexpected in either case.

I'd propose that Font objects have attributes path (unicode) and pathb (bytes) on both platforms. The implementation would use path to open the file on Windows and pathb on POSIX.

On Windows, the argument should preferably be unicode.

  • If passed unicode: encode with fs encoding and the 'strict' handler - if it can't be encoded, then pathb is None.

  • If passed bytes: decode with fs encoding, fail if it can't be decoded (though I think that can't occur in standard locales).

On POSIX:

  • If passed unicode: encode with fs encoding and the 'strict' handler, fail if it can't be encoded.
  • If passed bytes: decode with fs encoding, and if it can't be decoded, then pathb is set and path is None. If it's bytes, we try to decode it using the default filesystem encoding

Comments (2)

  1. Log in to comment