1. Kirill Simonov
  2. pyyaml
Issue #26 new

Python Reader incorrectly rejects unicode characters above 0xffff as unprintable

Anonymous created an issue

The C version works ok:

In [98]: yaml.load(u"hello: 'Jack ⛄😁'")

ReaderError Traceback (most recent call last) <ipython-input-98-93005720ac55> in <module>() ----> 1 yaml.load(u"hello: 'Jack ⛄😁'")

/usr/lib64/python2.7/site-packages/yaml/init.pyc in load(stream, Loader) 67 and produce the corresponding Python object. 68 """ ---> 69 loader = Loader(stream) 70 try: 71 return loader.get_single_data()

/usr/lib64/python2.7/site-packages/yaml/loader.pyc in init(self, stream) 32 33 def init(self, stream): ---> 34 Reader.init(self, stream) 35 Scanner.init(self) 36 Parser.init(self)

/usr/lib64/python2.7/site-packages/yaml/reader.pyc in init(self, stream) 72 if isinstance(stream, unicode): 73 self.name = "<unicode string>" ---> 74 self.check_printable(stream) 75 self.buffer = stream+u'\0' 76 elif isinstance(stream, str):

/usr/lib64/python2.7/site-packages/yaml/reader.pyc in check_printable(self, data) 142 position = self.index+(len(self.buffer)-self.pointer)+match.start() 143 raise ReaderError(self.name, position, ord(character), --> 144 'unicode', "special characters are not allowed") 145 146 def update(self, length):

ReaderError: unacceptable character #x1f601: special characters are not allowed in "<unicode string>", position 14

In [99]: yaml.load(u"hello: 'Jack ⛄😁'", Loader=yaml.CLoader) Out[99]: {'hello': u'Jack \u26c4\U0001f601'}

Perhaps a better value for yaml.reader.Reader.NON_PRINTABLE would be: re.compile(ur'[\x00-\x08\x0b\x0c\x0e-\x1f\x7f-\x84\x86-\x9f\ud800-\udfff\ufffe\uffff]') This would reject the same set of characters below 0x10000 as the current re but allow other characters be default.

Comments (0)

  1. Log in to comment