PyPy3-5.10 incorrectly decodes astral plane JSON characters

Issue #2729 new
Ned Batchelder
created an issue

This is a regression from pypy3-5.9

import json
note = u"a\xa0\u266b\U0001d157"
j = json.dumps({"note": note})
round_tripped = json.loads(j)['note']
print(ascii(note))
print(ascii(j))
print(ascii(round_tripped))

PyPy3-5.9 printed:

'a\xa0\u266b\U0001d157'
'{"note": "a\\u00a0\\u266b\\ud834\\udd57"}'
'a\xa0\u266b\U0001d157'

(as does every CPython 3.x, and CPython 2.x with a definition of ascii)

PyPy3-5.10 prints:

'a\xa0\u266b\U0001d157'
'{"note": "a\\u00a0\\u266b\\ud834\\udd57"}'
'a\xa0\u266b\ud834\udd57'

This is on a Mac. The PyPy2-5.10 binary didn't work at all for me, so I'm not sure what it produces.

Comments (6)

  1. Armin Rigo

    I wouldn't call it a regression, because pypy3-5.9 was never released on OS/X by us. It's certainly a bug, though. I have no clue why it would be OS/X-specific---it's not an issue of width of wchar_t, because both on Linux and on Windows it works as expected.

    Maybe something like: it was compiled with a narrow wchar_t but the translation host Python/PyPy had itself a wide wchar_t? On Windows the host Python/PyPy is always narrow. On OS/X I think it depends.

  2. Ned Batchelder reporter

    I have these versions on my Mac, and they all produce the correct output:

    $ pypy3-2.4 foo.py
    3.2.5 (b2091e973da6, Oct 19 2014, 18:30:58)
    [PyPy 2.4.0 with GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)]
    'a\xa0\u266b\U0001d157'
    '{"note": "a\\u00a0\\u266b\\ud834\\udd57"}'
    'a\xa0\u266b\U0001d157'
    $ pypy3-5.2 foo.py
    3.3.5 (40497617ae91, May 30 2016, 04:49:21)
    [PyPy 5.2.0-alpha0 with GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
    'a\xa0\u266b\U0001d157'
    '{"note": "a\\u00a0\\u266b\\ud834\\udd57"}'
    'a\xa0\u266b\U0001d157'
    $ pypy3-5.5 foo.py
    3.3.5 (619c0d5af0e5, Oct 08 2016, 22:08:19)
    [PyPy 5.5.0-alpha0 with GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
    'a\xa0\u266b\U0001d157'
    '{"note": "a\\u00a0\\u266b\\ud834\\udd57"}'
    'a\xa0\u266b\U0001d157'
    $ pypy3-5.9 foo.py
    3.5.3 (d72f9800a42b46a8056951b1da2426d2c2d8d502, Oct 07 2017, 08:21:16)
    [PyPy 5.9.0-beta0 with GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]
    'a\xa0\u266b\U0001d157'
    '{"note": "a\\u00a0\\u266b\\ud834\\udd57"}'
    'a\xa0\u266b\U0001d157'
    

    Something has changed:

    $ pypy3-5.10 foo.py
    3.5.3 (7a22aa3bd5bf, Dec 25 2017, 17:11:18)
    [PyPy 5.10.0 with GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)]
    'a\xa0\u266b\U0001d157'
    '{"note": "a\\u00a0\\u266b\\ud834\\udd57"}'
    'a\xa0\u266b\ud834\udd57'
    
  3. Log in to comment