Incremental Parsing

Baffo 32 avatarBaffo 32 created an issue

This JSON library seems perfect. It lacks only the ability to process documents that are larger than available memory.

In my dream world, successive calls after JSMN_ERROR_PART would somehow continue where the parsing left off.

Comments (4)

  1. Baffo 32

    Here's one way. jsmn_shuffle fixes up the parser and token structures so that jsmn_parse can be called again with a null-terminated buffer refilled from close to the start. It returns the number of characters that can be shifted off the buffer.

    /**             
     * Usage:       
     *   jsmntok_t tok[10];
     *   jsmnerr_t r;
     *   char buf[];
     *   const int buflen = sizeof(buf) - 1;
     *   int lastparsed = buflen;
     *   int overlap = 0;
     *   buf[buflen] = 0;
     *   for (;;) { 
     *              
     *      // pull some data from stream
     *      fill(buf + overlap, buflen - overlap);
     *      
     *      // parse buffer
     *      r = jsmn_parse(p, buf, tok, 10);
     *      
     *      // ... process tokens ...
     *      
     *      if (r == JSMN_ERROR_INVAL || r == JSMN_SUCCESS)
     *              break;
     *      
     *      // deallocate finished tokens
     *      lastparsed = jsmn_shuffle(p, tok);
     *          
     *      // preserve unparsed bits
     *      overlap = buflen - lastparsed;
     *      memmove(buf, buf + lastparsed, overlap);
     *      
     *   }
     */
    int jsmn_shuffle(jsmn_parser *parser, jsmntok_t *tokens) {
            int i;
            int lasttoken = parser->toknext;
            int lastchar = 0;
            parser->toknext = 0;
            parser->pos = 0;
            for (i = parser->toknext = 0; i < lasttoken; i++) {
                    // note rightmost parsed character
                    if (tokens[i].end > lastchar) {
                            lastchar = tokens[i].end;
                            if (tokens[i].type != JSMN_PRIMITIVE) {
                                    lastchar++;
                            }
                    }
                    if (tokens[i].start >= lastchar && tokens[i].start > 0) {
                            lastchar = tokens[i].start + 1;
                    }
                    // shove unfinished tokens to start
                    if (tokens[i].start != -1 && tokens[i].end == -1) {
                            if (parser->toksuper == i) {
                                    parser->toksuper = parser->toknext;
                            }
                            tokens[i].start = 0;
                            tokens[parser->toknext++] = tokens[i];
                    }
            }               
            return lastchar;
    }                       
    
    
  2. Baffo 32

    Thought: an alternative solution to shuffling all the tokens down might be to add a flag to jsmn_parse, such that it only clears finished tokens when initializing. Then the indices of working tokens would be preserved, but the ordering constant would no longer be -- ie super tokens would sometimes occur after their children in the token array.

  3. Log in to comment
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.