Issue #5 wontfix

Incremental Parsing

Baffo 32
created an issue

This JSON library seems perfect. It lacks only the ability to process documents that are larger than available memory.

In my dream world, successive calls after JSMN_ERROR_PART would somehow continue where the parsing left off.

Comments (6)

  1. Baffo 32 reporter

    Here's one way. jsmn_shuffle fixes up the parser and token structures so that jsmn_parse can be called again with a null-terminated buffer refilled from close to the start. It returns the number of characters that can be shifted off the buffer.

    /**             
     * Usage:       
     *   jsmntok_t tok[10];
     *   jsmnerr_t r;
     *   char buf[];
     *   const int buflen = sizeof(buf) - 1;
     *   int lastparsed = buflen;
     *   int overlap = 0;
     *   buf[buflen] = 0;
     *   for (;;) { 
     *              
     *      // pull some data from stream
     *      fill(buf + overlap, buflen - overlap);
     *      
     *      // parse buffer
     *      r = jsmn_parse(p, buf, tok, 10);
     *      
     *      // ... process tokens ...
     *      
     *      if (r == JSMN_ERROR_INVAL || r == JSMN_SUCCESS)
     *              break;
     *      
     *      // deallocate finished tokens
     *      lastparsed = jsmn_shuffle(p, tok);
     *          
     *      // preserve unparsed bits
     *      overlap = buflen - lastparsed;
     *      memmove(buf, buf + lastparsed, overlap);
     *      
     *   }
     */
    int jsmn_shuffle(jsmn_parser *parser, jsmntok_t *tokens) {
            int i;
            int lasttoken = parser->toknext;
            int lastchar = 0;
            parser->toknext = 0;
            parser->pos = 0;
            for (i = parser->toknext = 0; i < lasttoken; i++) {
                    // note rightmost parsed character
                    if (tokens[i].end > lastchar) {
                            lastchar = tokens[i].end;
                            if (tokens[i].type != JSMN_PRIMITIVE) {
                                    lastchar++;
                            }
                    }
                    if (tokens[i].start >= lastchar && tokens[i].start > 0) {
                            lastchar = tokens[i].start + 1;
                    }
                    // shove unfinished tokens to start
                    if (tokens[i].start != -1 && tokens[i].end == -1) {
                            if (parser->toksuper == i) {
                                    parser->toksuper = parser->toknext;
                            }
                            tokens[i].start = 0;
                            tokens[parser->toknext++] = tokens[i];
                    }
            }               
            return lastchar;
    }                       
    
    
  2. Baffo 32 reporter

    Thought: an alternative solution to shuffling all the tokens down might be to add a flag to jsmn_parse, such that it only clears finished tokens when initializing. Then the indices of working tokens would be preserved, but the ordering constant would no longer be -- ie super tokens would sometimes occur after their children in the token array.

  3. Serge Zaitsev repo owner

    Thanks for your suggestion. Guessing the right number of required tokens is now fixed by passing NULL as "tokens" parameter - the returned value will be the number of tokens to allocate. And as for shuffling - I believe this should not be part of jsmn. With jsmn half of the parser will be anyway written manually in your application code, and shuffling is an example of what can be easily implemented in the application code as well. Plus you gave a very nice example. Probably we should move the example to the wiki for those who need it, leaving jsmn code as small as possible?

  4. Log in to comment