json string UTF8 encoding

Issue #182 closed
dd1 created an issue

The current JSON RFC specifies that JSON strings should be valid UTF8 encoding (see the RFC for explanation). mjson.cxx:MakeString() does not verify that it is producing a json string with UTF8 encoding and this causes trouble with ODB (ODB names and ODB strings are not necessarily valid UTF8, see bug about this filed by Thomas L. and the UTF8 hecks in odb.c), and in cm_msg_retrieve() & co which read midas.log which may contain text that is not valid UTF8 (i.e. garbage strings left over from memory corruption).

The JSON encoder (MakeString()) needs to be changed to ensure it produces valid UTF8 strings. How to do this? Not sure. Truncate at the location of invalid UTF8? Replace invalid UTF8 with “X”es? Something else?

K.O.

Comments (3)

  1. dd1 reporter

    I think I will keep all this UTF-8 business out of the mjson.{h,cxx} class and trust the json users to generate valid UTF-8 strings. K.O.

  2. dd1 reporter

    Closing this bug. JSON RFC requires that only JSON used for interchange/interoperability must be UTF-8 encoded. For MIDAS this is:

    • JSON ODB save files: we are UTF-8 as long as everything in ODB is UTF-8. This is checked by ODB validation and the user is warned. If their ODB is not UTF-8 clean, it’s their problem.
    • JSON ODB dumps inside MIDAS data files: same as above
    • JSON generated by mjsonrpc, see bug #242.

    K.O.

  3. Log in to comment