s2i:combine-milestone-chunk slow on large texts

Issue #71 resolved
Craig Berry created an issue

Page next on a large text like Gerard's Herbal or Purchas can take many seconds. The main culprit is s2i:combine-milestone-chunk in standoff2inline.xqm in the annotation service (or possibly something that it calls). It can take 3-5 seconds just to do the combine operation. Given that we already have the starting and ending milestone nodes before we get here, it should not take any longer to do this for a large text than for a small one, but it does, by at least an order of magnitude.

Comments (4)

  1. Craig Berry reporter

    There is no reason to believe there is an eXist bug here. Just something in the merging of annotations into the main text that usually is so fast that it's unnoticeable but gets slow with a large text. Or maybe it's a large number of annotations, and the recent edition of all the autocorrect annotations is what I'm noticing.

  2. Craig Berry reporter

    Page next speeds are reasonable (a few hundred milliseconds) early in of these large texts but much slower (5 seconds or more) once you get a few hundred pages in. Later milestones should not be slower to retrieve with indexes than earlier ones. This needs more study.

  3. Craig Berry reporter

    We have stopped using this slow function and sped up page turning by a factor of 3 or 4 as of:

    commit 1a3d6511db638887f2efcc072f3ccc555e34098f
    Author: Craig A. Berry craigberry@mac.com
    Date: Sat Aug 15 15:12:56 2020 -0500
    
    Ditch s2i:merge-milestone-chunk for faster page turning
    
    It's very slow and buggy, and it turns out not to be necessary at
    all.  The annotation client has all the annotation information, so
    with minor modifications it can make any appropriate display
    changes in the browser in a few milliseconds. It was already doing
    highlighting, but it can easily modify content on-the-fly as well.
    
    We go back to simply fetching the XML fragment containing the page
    or div of interest as the original TEI Simple application did.
    
    In the case of Hakluyt's Principal Voyages, paging back and forth
    between page 750-a and 750-b was taking over 12 seconds per page
    turn on average.  Now it's under three seconds.
    
    N.B.  There is still a bug in eXist where accessing any node in a
    long document is much slower than accessing a node in a shorter
    document, and the nearer that node is to the end of the long
    document, the worse it gets. Performance will never be great on
    long texts unless and until that gets fixed.
    
    N.B. #2. The Review page still uses s2i:wrap-recursive and may not
    really need to, but it doesn't seem to be the rate limiting
    component there.
    
    
    

  4. Log in to comment