Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

Parsing

Parsing is the process where a serialized JSON is validated and decomposed into elements. The library provides these functions and types to assist with parsing:

Table 1.8. Parsing Functions and Types

Name

Description

basic_parser

A SAX push parser implementation which converts a serialized JSON into a series of member function calls to a user provided handler. This allows custom behaviors to be implemented for representing the document in memory.

parse_options

A structure used to select which extensions are enabled during parsing.

parse

Parse a string containing a complete serialized JSON, and return a value.

parser

A stateful DOM parser object which may be used to efficiently parse a series of JSONs each contained in a single contiguous character buffer, returning each result as a value.

stream_parser

A stateful DOM parser object which may be used to efficiently parse a series of JSONs incrementally, returning each result as a value.

value_stack

A low level building block used for efficiently building a value. The parsers use this internally, and users may use it to adapt foreign parsers to use this library's containers.


The parse function offers a simple interface for converting a serialized JSON text to a value in a single function call. This overload uses exceptions to indicate errors:

value jv = parse( "[1,2,3,4,5]" );

Alternatively, an error_code can be used:

error_code ec;
value jv = parse( "[1,2,3,4,5]", ec );
if( ec )
    std::cout << "Parsing failed: " << ec.message() << "\n";

Even when using error codes, exceptions thrown from the underlying memory_resource are still possible:

try
{
    error_code ec;
    value jv = parse( "[1,2,3,4,5]", ec );
    if( ec )
        std::cout << "Parsing failed: " << ec.message() << "\n";
}
catch( std::bad_alloc const& e)
{
    std::cout << "Parsing failed: " << e.what() << "\n";
}

The value returned in the preceding examples use the default memory resource. The following code uses a monotonic_resource, which results in faster parsing. jv is marked const to prevent subsequent modification, because containers using a monotonic resource waste memory when mutated.

{
    monotonic_resource mr;

    value const jv = parse( "[1,2,3,4,5]", &mr );
}
Non-Standard JSON

Unless otherwise specified, the parser in this library is strict. It recognizes only valid, standard JSON. The parser can be configured to allow certain non-standard extensions by filling in a parse_options structure and passing it by value. By default all extensions are disabled:

parse_options opt;                                  // all extensions default to off
opt.allow_comments = true;                          // permit C and C++ style comments to appear in whitespace
opt.allow_trailing_commas = true;                   // allow an additional trailing comma in object and array element lists
opt.allow_invalid_utf8 = true;                      // skip utf-8 validation of keys and strings

value jv = parse( "[1,2,3,] // comment ", storage_ptr(), opt );

When building with C++20 or later, the use of designated initializers with parse_options is possible:

value jv = parse( "[1,2,3,] // comment ", storage_ptr(),
    {
        .allow_comments = true,             // permit C and C++ style comments to appear in whitespace
        .allow_trailing_commas = true,      // allow a trailing comma in object and array lists
        .allow_invalid_utf8 = true          // skip utf-8 validation of keys and strings
    });
Parser Instance

Instances of parser and stream_parser offer functionality beyond what is available when using the parse free functions:

The parser implementation uses temporary storage space to accumulate values during parsing. When using the parse free functions, this storage is allocated and freed in each call. However, by declaring an instance of parser or stream_parser, this temporary storage can be reused when parsing more than one JSON, reducing the total number of dynamic memory allocations.

To use the stream_parser, declare an instance. Then call write zero or more times with successive buffers representing the input JSON. When there are no more buffers, call finish. The function done. returns true after a successful call to write or finish if parsing is complete. This example persists the parser instance in a class member to reuse across calls:

class connection
{
    parser p_;                                      // persistent data member

public:
    void do_read( string_view s )                   // called for each complete message from the network
    {
        p_.reset();                                 // start parsing a new JSON
        p_.write( s );                              // parse the buffer, using exceptions to indicate error
        do_rpc( p_.release() );                     // process the command
    }

    void do_rpc( value jv );
};

The parser interface allows a streaming algorithm; it is possible to parse a JSON incrementally, in pieces. The entire input JSON does not need to be loaded into memory at once first. This interface requires more function calls than with the parse free functions. A network server can use the streaming interface to process incoming JSON in fixed-size amounts, providing these benefits:

In the following example a JSON is parsed from standard input a line at a time. Error codes are used instead. The function finish is used to indicate the end of the input:

value read_json( std::istream& is, error_code& ec )
{
    stream_parser p;
    std::string line;
    while( std::getline( is, line ) )
    {
        p.write( line, ec );
        if( ec )
            return nullptr;
    }
    p.finish( ec );
    if( ec )
        return nullptr;
    return p.release();
}

Sometimes a protocol may have a JSON text followed by data that is in a different format or specification. The JSON portion can still be parsed by using the function write_some. Upon success, the return value will indicate the number of characters consumed from the input, which will exclude the non-JSON characters:

stream_parser p;
error_code ec;
string_view s = "[1,2,3] %HOME%";
std::size_t n = p.write_some( s, ec );
assert( ! ec && p.done() && n == 8 );
s = s.substr( n );
value jv = p.release();
assert( s == "%HOME%" );

The parser instance may be constructed with parse options which allow some non-standard JSON extensions to be recognized:

parse_options opt;                                  // All extensions default to off
opt.allow_comments = true;                          // Permit C and C++ style comments to appear in whitespace
opt.allow_trailing_commas = true;                   // Allow an additional trailing comma in object and array element lists
opt.allow_invalid_utf8 = true;                      // Skip utf-8 validation of keys and strings
stream_parser p( storage_ptr(), opt );                     // The stream_parser will use the options
Parser Instance

Instances of parser and stream_parser offer functionality beyond what is available when using the parse free functions:

The parser implementation uses temporary storage space to accumulate values during parsing. When using the parse free functions, this storage is allocated and freed in each call. However, by declaring an instance of parser or stream_parser, this temporary storage can be reused when parsing more than one JSON, reducing the total number of dynamic memory allocations.

To use the stream_parser, declare an instance. Then call write zero or more times with successive buffers representing the input JSON. When there are no more buffers, call finish. The function done. returns true after a successful call to write or finish if parsing is complete. This example persists the parser instance in a class member to reuse across calls:

class connection
{
    parser p_;                                      // persistent data member

public:
    void do_read( string_view s )                   // called for each complete message from the network
    {
        p_.reset();                                 // start parsing a new JSON
        p_.write( s );                              // parse the buffer, using exceptions to indicate error
        do_rpc( p_.release() );                     // process the command
    }

    void do_rpc( value jv );
};

The parser interface allows a streaming algorithm; it is possible to parse a JSON incrementally, in pieces. The entire input JSON does not need to be loaded into memory at once first. This interface requires more function calls than with the parse free functions. A network server can use the streaming interface to process incoming JSON in fixed-size amounts, providing these benefits:

In the following example a JSON is parsed from standard input a line at a time. Error codes are used instead. The function finish is used to indicate the end of the input:

value read_json( std::istream& is, error_code& ec )
{
    stream_parser p;
    std::string line;
    while( std::getline( is, line ) )
    {
        p.write( line, ec );
        if( ec )
            return nullptr;
    }
    p.finish( ec );
    if( ec )
        return nullptr;
    return p.release();
}

Sometimes a protocol may have a JSON text followed by data that is in a different format or specification. The JSON portion can still be parsed by using the function write_some. Upon success, the return value will indicate the number of characters consumed from the input, which will exclude the non-JSON characters:

stream_parser p;
error_code ec;
string_view s = "[1,2,3] %HOME%";
std::size_t n = p.write_some( s, ec );
assert( ! ec && p.done() && n == 8 );
s = s.substr( n );
value jv = p.release();
assert( s == "%HOME%" );

The parser instance may be constructed with parse options which allow some non-standard JSON extensions to be recognized:

parse_options opt;                                  // All extensions default to off
opt.allow_comments = true;                          // Permit C and C++ style comments to appear in whitespace
opt.allow_trailing_commas = true;                   // Allow an additional trailing comma in object and array element lists
opt.allow_invalid_utf8 = true;                      // Skip utf-8 validation of keys and strings
stream_parser p( storage_ptr(), opt );                     // The stream_parser will use the options
Controlling Memory

After default construction, or after reset is called with no arguments, the value produced after a successful parse operation uses the default memory resource. To use a different memory resource, call reset with the resource to use. Here we use a monotonic_resource, which is optimized for parsing but not subsequent modification:

{
    monotonic_resource mr;

    stream_parser p;
    p.reset( &mr );                                 // Use mr for the resulting value
    p.write( "[1,2,3,4,5]" );                       // Parse the input JSON
    value const jv = p.release();                   // Retrieve the result
    assert( *jv.storage() == mr );                  // Same memory resource
}

To achieve performance and memory efficiency, the parser uses a temporary storage area to hold intermediate results. This storage is reused when parsing more than one JSON, reducing the total number of calls to allocate memory and thus improving performance. Upon construction, the memory resource used to perform allocations for this temporary storage area may be specified. Otherwise, the default memory resource is used. In addition to a memory resource, the parser can make use of a caller-owned buffer for temporary storage. This can help avoid dynamic allocations for small inputs. The following example uses a 4kb temporary buffer for the parser, and falls back to the default memory resource if needed:

unsigned char temp[ 4096 ];                                 // Declare our buffer
stream_parser p(
    storage_ptr(),                                          // Default memory resource
    parse_options{},                                        // Default parse options (strict parsing)
    temp);                                                  // Use our buffer for temporary storage

Through careful specification of buffers and memory resources, it is possible to eliminate all dynamic allocation completely when parsing JSON, for the case where the entire JSON is available in a single character buffer, as shown here:

/*  Parse JSON and invoke the handler

    This function parses the JSON specified in `s`
    and invokes the handler, whose signature must
    be equivalent to:

        void( value const& jv );

    The operation is guaranteed not to perform any
    dynamic memory allocations. However, some
    implementation-defined upper limits on the size
    of the input JSON and the size of the resulting
    value are imposed.

    Upon error, an exception is thrown.
*/
template< class Handler >
void do_rpc( string_view s, Handler&& handler )
{
    unsigned char temp[ 4096 ];                 // The stream_parser will use this storage for its temporary needs
    stream_parser p(                                   // Construct a strict stream_parser using the temp buffer and no dynamic memory
        get_null_resource(),                    // The null resource guarantees we will never dynamically allocate
        parse_options(),                        // Default constructed parse options allow only standard JSON
        temp );

    unsigned char buf[ 16384 ];                 // Now we need a buffer to hold the actual JSON values
    static_resource mr2( buf );                 // The static resource is monotonic, using only a caller-provided buffer
    p.reset( &mr2 );                            // Use the static resource for producing the value

    p.write( s );                               // Parse the entire string we received from the network client
    p.finish();                                 // Inform the stream_parser that the complete input has been provided

    // Retrieve the value and invoke the handler with it.
    // The value will use `buf` for storage. The handler
    // must not take ownership, since monotonic resources
    // are inefficient with mutation.
    handler( p.release() );
}
Custom Parsers

Users who wish to implement custom parsing strategies may create their own handler to use with an instance of basic_parser. The handler implements the function signatures required by SAX event interface. In this example we define the "null" parser, which does nothing with the parsed results, to use in the implementation of a function that determines if a JSON text is valid.

/*
    This example verifies that a file contains valid JSON.
*/

#include <boost/json.hpp>

// This file must be manually included when
// using basic_parser to implement a parser.
#include <boost/json/basic_parser_impl.hpp>

#include <iomanip>
#include <iostream>

#include "file.hpp"

using namespace boost::json;

// The null parser discards all the data
class null_parser
{
    struct handler
    {
        constexpr static std::size_t max_object_size = std::size_t(-1);
        constexpr static std::size_t max_array_size = std::size_t(-1);
        constexpr static std::size_t max_key_size = std::size_t(-1);
        constexpr static std::size_t max_string_size = std::size_t(-1);

        bool on_document_begin( error_code& ) { return true; }
        bool on_document_end( error_code& ) { return true; }
        bool on_object_begin( error_code& ) { return true; }
        bool on_object_end( std::size_t, error_code& ) { return true; }
        bool on_array_begin( error_code& ) { return true; }
        bool on_array_end( std::size_t, error_code& ) { return true; }
        bool on_key_part( string_view, std::size_t, error_code& ) { return true; }
        bool on_key( string_view, std::size_t, error_code& ) { return true; }
        bool on_string_part( string_view, std::size_t, error_code& ) { return true; }
        bool on_string( string_view, std::size_t, error_code& ) { return true; }
        bool on_number_part( string_view, error_code& ) { return true; }
        bool on_int64( std::int64_t, string_view, error_code& ) { return true; }
        bool on_uint64( std::uint64_t, string_view, error_code& ) { return true; }
        bool on_double( double, string_view, error_code& ) { return true; }
        bool on_bool( bool, error_code& ) { return true; }
        bool on_null( error_code& ) { return true; }
        bool on_comment_part(string_view, error_code&) { return true; }
        bool on_comment(string_view, error_code&) { return true; }
    };

    basic_parser<handler> p_;

public:
    null_parser()
        : p_(parse_options())
    {
    }

    ~null_parser()
    {
    }

    std::size_t
    write(
        char const* data,
        std::size_t size,
        error_code& ec)
    {
        auto const n = p_.write_some( false, data, size, ec );
        if(! ec && n < size)
            ec = error::extra_data;
        return n;
    }
};

bool
validate( string_view s )
{
    // Parse with the null parser and return false on error
    null_parser p;
    error_code ec;
    p.write( s.data(), s.size(), ec );
    if( ec )
        return false;

    // The string is valid JSON.
    return true;
}

int
main(int argc, char** argv)
{
    if(argc != 2)
    {
        std::cerr <<
            "Usage: validate <filename>"
            << std::endl;
        return EXIT_FAILURE;
    }

    try
    {
        // Read the file into a string
        auto const s = read_file( argv[1] );

        // See if the string is valid JSON
        auto const valid = validate( s );

        // Print the result
        if( valid )
            std::cout << argv[1] << " contains a valid JSON\n";
        else
            std::cout << argv[1] << " does not contain a valid JSON\n";
    }
    catch(std::exception const& e)
    {
        std::cerr <<
            "Caught exception: "
            << e.what() << std::endl;
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

PrevUpHomeNext