HTTPS SSH

µjson

About

µjson is a a small, C++11, UTF-8, JSON library.

Its highlights are:

  • Small library with very simple API
  • Outputs nicely formatted JSON
  • Fast UTF-8 conformant parser
  • Liberal license

Dependencies

The library uses the double-conversion library from the
V8 JavaScript engine
for portable conversion between ASCII and floating point numbers. An
amalgamation of v1.1.5 of this library is included in the source
distribution.

Unit tests are written using
Catch. Catch is also included
in the source distribution.

The scanner is generated using re2c. The source
distribution includes the generated file, so this tool is only needed
if you intend to modify the scanner.

Licenses

µjson is licensed under the MIT license. See the LICENSE.md file in the
source distribution.

The dependencies all have liberal software licenses. See
LICENSE-3RD-PARTY.md for the details.

Installation

The library, examples, and unit tests can be built using
CMake. The CMake scripts will automatically
download re2c.

When using the library in another project, rather than using CMake, it
may be easier to simply include the four source files,

ujson.hpp
ujson.cpp
double-conversion.h
double-conversion.cc,

directly in the project.

Tutorial

Consider representing books defined using this simple struct as JSON:

struct book_t {
    std::string title;
    std::vector<std::string> authors;
    int year;
};

The first step is to write a small function for converting a book into a
ujson::value:

ujson::value to_json(book_t const &b) {
    return ujson::object{ { "title", b.title },
                          { "authors", b.authors },
                          { "year", b.year } };
}

Using the above function an array of books can be converted to JSON as
follows:

book_t book1{ "Elements of Programming",
              2009,
              { "Alexander A. Stepanov", "Paul McJones" } };
book_t book2{ "The C++ Programming Language, 4th Edition",
              2013,
              { "Bjarne Stroustrup" } };
std::vector<book_t> book_list{ book1, book2 };

ujson::value value{ book_list };
std::string json = to_string(value);
std::cout << json << std::endl;

The last line will print:

[
    {
        "authors" : [
            "Alexander A. Stepanov",
            "Paul McJones"
        ],
        "title" : "Elements of Programming",
        "year" : 2009
    },
    {
        "authors" : [
            "Bjarne Stroustrup"
        ],
        "title" : "The C++ Programming Language, 4th Edition",
        "year" : 2013
    }
]

Reconstructing the list of books is done by first parsing the JSON
string into a ujson::value:

ujson::value new_value = ujson::parse(json);
assert(new_value == value);

Each element in this array is then converted to a book_t:

std::vector<ujson::value> array = array_cast(std::move(new_value));
std::vector<book_t> new_book_list;
new_book_list.reserve(array.size());
for (auto it = array.begin(); it != array.end(); ++it)
    new_book_list.push_back(make_book(std::move(*it)));
assert(new_book_list == book_list);

The helper function make_book is implemented as follows:

book_t make_book(ujson::value v) {

    if (!v.is_object())
        throw std::invalid_argument("object expected for make_book");

    book_t book;
    std::vector<std::pair<std::string, ujson::value>> object =
        object_cast(std::move(v));

    auto it = find(object, "title");
    if (it == object.end() || !it->second.is_string())
        throw std::invalid_argument("'title' with type string not found");
    book.title = string_cast(std::move(it->second));

    it = find(object, "authors");
    if (it == object.end() || !it->second.is_array())
        throw std::invalid_argument("'authors' with type array not found");
    std::vector<ujson::value> array = array_cast(std::move(it->second));
    book.authors.reserve(array.size());
    for (auto it = array.begin(); it != array.end(); ++it) {
        if (!it->is_string())
            throw std::invalid_argument("'authors' must be array of strings");
        book.authors.push_back(string_cast(std::move(*it)));
    }

    it = find(object, "year");
    if (it == object.end() || !it->second.is_number())
        throw std::invalid_argument("'year' with type number not found");
    book.year = int32_cast(it->second);

    return book;
}

Reference

A JSON value must be null, a boolean, a number, a string, an array, or
an object (see RFC7159). In
µjson the class ujson::value is used to represent all of these six
types.

The actual type of a value can queried using ujson::value::type or
using one of the convenience methods, such as ujson::value::is_null.
Values always contain one of the six possible types (ujson::value
does not have a special uninitialized state).

The class ujson::value is a proper immutable value. Therefore, once
a value has been created, it cannot be changed, though of course it
can be assigned a new value. Values can be compared for equality and
inequality.

Casts are used to extract the embedded type again. For instance
bool_cast is used to extract the bool from values with boolean
types. If the value is cast to a wrong type a bad_cast exception is
thrown.

Null

Default constructed values are null:

ujson::value null_value; // null

A constant null value is defined in the ujson namespace.

assert(ujson::null == null_value);

Values support stream i/o:

std::cout << null_value << std::endl; // prints 'null'

Booleans

Values can be initialized with and assigned bools:

ujson::value boolean(true);
assert(bool_cast(boolean) == true);
std::cout << boolean << std::endl; // prints 'true'
boolean = false;
assert(bool_cast(boolean) == false);
std::cout << boolean << std::endl; // prints 'false'

Numbers

Inside ujson::values numbers are represented as 64-bit doubles:

ujson::value number = M_PI;
std::cout << number << std::endl; // prints '3.141592653589793'

The double value can be extracted using a double_cast:

double d = double_cast(number); // d == M_PI

The double-conversion library is used instead of the platform specific C
runtime library to ensure lossless and portable roundtripping of doubles from
ASCII to binary.

Beware that only finite numbers are valid in JSON. Infinities and NaNs
are not allowed:

number = std::numeric_limits<double>::infinity(); // throws bad_number

Numbers can also represent signed 32-bit integers:

number = 1024;
std::cout << number << std::endl; // prints '1024'

The integer value can be extracted using an int32_cast:

std::int32_t i = int32_cast(number); // i == 1024

Unsigned 32-bit integers are also supported.

Strings

Strings are stored internally as UTF-8:

ujson::value value = "\xC2\xA9 ujson 2014"; // copyright symbol

If the string is not zero-terminated or contains embedded zeros, the
length must be passed too:

char title[]= { 0xC2, 0xB5, 'j', 's', 'o', 'n' }; // micro sign + json
value = ujson::value(title, 6);

Strings passed to to µjson must be valid UTF-8:

value = "\xF5"; // invalid utf-8; throws bad_string

If the string is known to be valid UTF-8, the validation step can be
skipped by passing no in the last argument of the constructor:

value = ujson::value("valid", 5, ujson::validate_utf8::no);

Strings can also be constructed from std::strings:

std::string string("ujson");
value = string; // copy into value

Alternatively, if the original string is no longer needed, the
std::string can be moved into the value and the copy avoided:

value = std::move(string); // move into value

Strings can be accessed using the two string_cast methods. The first
accepts l-values and returns a ujson::string_view object:

auto view = string_cast(value);
std::cout << view.c_str() << std::endl; // prints 'ujson'

The returned string view object provides read-only access to the
contained string.

The second string cast method accepts r-values and can be used to move
a string out of a value:

string = string_cast(std::move(value)); // move string out of value
assert(value.is_null());

Moved from values are always null.

See the "Implementation Details" section for more information on how
µjson handles std::strings implemented using reference counting
versus short string optimization.

Arrays

Arrays are represented using ujson::array, which is simply a typedef for
std::vector<ujson::value>:

auto array = ujson::array{ true, M_PI, "a string" };
ujson::value value(array);

Copying the array can be avoided by moving it into the value:

value = std::move(array);

Read-only access to the contained array is possible using array_cast:

ujson::array const &ref = array_cast(value);

The original array can be recovered by moving the array out of the value:

array = array_cast(std::move(value));

As shown in the tutorial it is also possible to use a std::vector<T>
of types T implicitly convertable to ujson::value or a vector of
types that supply a to_json function.

ujson::values are designed to be cheap to copy. Internally, strings,
arrays, and objects, are stored using std::shared_ptr<>s, so copying
only requires incrementing a reference count. However, this sharing
has implications for when it is possible to move:

ujson::value value1 = std::move(array);
ujson::value value2 = value1; // value2 shares immutable array with value1
auto tmp1 = array_cast(std::move(value1)); // note: copy!
auto tmp2 = array_cast(std::move(value2)); // move

In short, moves are only possible if the value has exclusive ownership
of the resource. Recall that moved from values are null, so therefore
the last move will succeed.

Objects

Objects are represented using ujson::object, which is simply a
typedef for std::vector<std::pair<std::string, ujson::value>>:

auto object =
    ujson::object{ { "a null", ujson::null },
                   { "a bool", true },
                   { "a number", M_LN2 },
                   { "a string", "Hello, world!" },
                   { "an array", ujson::array{ 1, 2, 3 } } };
ujson::value value(object);

As usual, copies can be avoided by moving:

value = std::move(object);

Read-only access to the contained object is possible using object_cast:

ujson::object const &ref = object_cast(value);

The original object can be recovered by moving the object out of the value:

object = object_cast(std::move(value));

For performance reasons objects are implemented using a simple
std::vector rather than a std::map. However, objects can still be
constructed using a std::map<std::string,T> of types T implicitly
convertable to ujson::value or a map of types that supply a
to_json function.

When an ujson::object is copied or moved into an ujson::value the vector
is sorted, so that lookups can be performed using a binary search:

auto it = find(object, "a number");
assert(it->second == M_LN2);

In addition to ujson::find, there is also a ujson::at function
which behaves like std::map::at.

Beware that names in objects must also be valid UTF-8:

object.push_back({ "invalid utf-8: \xFF", ujson::null });
value = object; // throws bad_string

Reading JSON

Call ujson::parse to parse a buffer with UTF-8 encoded JSON:

auto value = ujson::parse("[ 1.0, 2.0, 3.0 ]");

If the buffer is not zero-terminated, which is the case with e.g. memory mapped
files, the length must also be supplied:

const char *mapped_buffer = ..;
std::size_t mapped_length = ..;
auto value = ujson::parse(mapped_buffer, mapped_length);

Exceptions are thrown on syntax errors:

try {
    auto value = ujson::parse("[ 1.0, 2.0, 3.0 "); // invalid syntax
    ...
} catch (std::exception const &e) {
    std::cout << e.what() << std::endl; // prints 'Invalid syntax on line 1.'
}

Apart from syntax errors, the parser will also throw if a number is
too large to fit in a double, if a string contains invalid UTF-8, and if
the buffer contains trailing junk.

Writing JSON

ujson::values can be converted to JSON using ujson::to_string:

auto array = ujson::array{ true, 1.0, "Sk\xC3\xA5l! \xF0\x9F\x8D\xBB" };
auto object =
    ujson::object{ { "a null", ujson::null },
                   { "a bool", false },
                   { "a number", 1.61803398875 },
                   { "a string", "R\xC3\xB8""dgr\xC3\xB8""d med fl\xC3\xB8""de." },
                   { "an array", array } };
std::cout << to_string(object) << std::endl;

This produces:

{
    "a bool" : false,
    "a null" : null,
    "a number" : 1.61803398875,
    "a string" : "Rødgrød med fløde.",
    "an array" : [
        true,
        1,
        "Skål! 🍻"
    ]
}

By default µjson indents by four spaces. It's possible change this and
also control whether UTF-8 is allowed in the output:

ujson::to_string_options compact_ascii;
compact_ascii.indent_amount = 0;
compact_ascii.encoding = ujson::character_encoding::ascii;
std::cout << to_string(object, compact_ascii) << std::endl;

With ASCII output, all non-ASCII characters are escaped and with zero
indentation all insignificant white space is elided:

{"a bool":true,"a null":null,"a number":1.61803398875,"a string":"R\u00F8dgr\u00F8d med fl\u00F8de.","an array":[true,1,"Sk\u00E5l! \uD83C\uDF7B"]}

Implementation Details

ujson::value is implemented using small object optimiziation. This
avoids the need for expensive heap allocations for simple types, since
the value instead is stored directly inside the object.

type heap allocation
null no
boolean no
number no
string depends
array yes
object yes

Arrays and objects do require heap allocations, since they are stored
internally using a std::shared_ptr (usually just a single allocation
is required, since most STL implementations allocate the object and
control block together). While this does make construction more
expensive, it has the advantage that copying values containing arrays
or objects is cheap, since it only amounts to incrementing a reference
count.

Also, since the reference count used by std::shared_ptr is
thread-safe and the pointed to value immutable, passing a
ujson::value by value to another thread is free from race
conditions:

auto value = ujson::parse(...);
auto future = std::async(std::launch::async, [value] {
    /* do significant work */ });

Strings in the Standard Template Library are implemented using either
short string optimization (SSO) or reference counting. Clang's libc++
and Visual Studio uses the former approach while GCC's libstdc++ uses
the latter. Briefly, in an implementation using reference counting, a
std::string stores a pointer to the string data and a reference
count. Copy on write (COW) is used to ensure that a string gets it own
unique copy of the string data if modified. In an implementation using
SSO, the string object stores a pointer and a small buffer. Short
strings are stored in the buffer, thus avoiding the heap allocation,
whereas longer strings are stored on the heap. The size of the buffer
for short strings is implementation defined. See the 'sso buffer
size' column in the following table.

platform arch ujson::value std::string sso buffer size
clang 3.4 (Xcode 5.1.1) 32-bit 16 bytes 12 bytes 10 bytes
clang 3.4 (Xcode 5.1.1) 64-bit 32 bytes 24 bytes 22 bytes
gcc 4.8.3 (via brew) 32-bit 12 bytes 4 bytes N/A
gcc 4.8.3 (via brew) 64-bit 24 bytes 8 bytes N/A
vs2013 update 3 32 bit 24 bytes 28 bytes 15 bytes
vs2013 update 3 64 bit 32 bytes 40 bytes 15 bytes
vs2013 ctp1 32-bit 24 bytes 28 bytes 15 bytes
vs2013 ctp1 64-bit 32 bytes 32 bytes 15 bytes

With a COW std::string µjson simply stores the string object inside
inside the ujson::value without doing any allocations. Copying is
still inexpensive since copying COW strings is cheap.

With a SSO std::string short strings are stored directly in the
ujson::value object and therefore do not require any heap
allocations. Long strings are stored using a std::shared_ptr, so
they require a single allocation. Like arrays and objects, copying
long strings is therefore cheap.

In summary, copy constructing and copy assigning ujson::values is
always an inexpensive operation, requiring at most bumping a
reference count or copying a small buffer, but never any heap
allocations.