Get word and page counts from ep header, when available, for metadata indexing.

Martin Mueller

Is this an eXist bug we should report to them? You said in an earlier email that Java would handle this case, but eXist as a Java app does no.t

2018-04-12T17:47:07+00:00

Philip Burns reporter

The word and page counts is our (my) implementation issue. It works
fine, it's just slow if the code has to scan the entirety of each document to count the <pb> and <w> elements. If instead the counts resides in the ep header, pulling the values out of there is much faster.

The issue with the number formatting may be a bug or a matter of
unclear documentation. Our case is simple -- we want integer values to display as integers -- and I have a fix for that. If we wanted a fancier display, we'd have to look into the issue further.

-- Philip R. "Pib" Burns Academic Software Development Northwestern University, Evanston, IL. USA pib@northwestern.edu

2018-04-12T17:55:14+00:00

Martin Mueller

integers are good enough. We have lots of other things on our plage.

2018-04-12T17:56:17+00:00

Craig Berry

changed status to resolved

Resolved by:

#!

commit bbe1bd29aa709d53861c070e2c5143ca013707d5 (HEAD -> master, origin/master, origin/HEAD)
Author: Philip R. Burns <pib@northwestern.edu>
Date:   Thu Apr 12 16:30:24 2018 -0500

    Improve indexing of page and word counts.

    If we have it precalculated in the xenodata, just use that instead
    of the much slower count done here.

    Also, force the counts to be integers so they don't get displayed
    in scientific notation.

2018-04-12T22:20:30+00:00

Comments (4)