Spurious "-00-" in some xml:id attributes

Issue #82 new
Craig Berry created an issue

These are the texts with at least one instance of "-00-" with the number of instances after the colon. Easy enough to fix in most cases, though each will need to be coordinated with any existing annotations:

$ find . -name '*.xml' -exec grep -H -c '\-00\-' {} \; | perl -ne 'print $_ unless $_ =~ m/:0$/;'
./A60/A60244.xml:1
./A02/A02923.xml:287526
./A33/A33998.xml:1
./A32/A32160.xml:1
./A04/A04923.xml:39125
./A43/A43846.xml:2
./A37/A37461.xml:19
./A52/A52120.xml:2
./A36/A36644.xml:1
./A36/A36791.xml:7
./A07/A07168.xml:40669
./A53/A53062.xml:2
./A91/A91584.xml:1
./A85/A85670.xml:21

When there are only one or two instances, there is probably some other problem going on, such as this one:

  <w lemma="Simon" pos="n1-nn" rendition="#hi-before-apostr" xml:id="A60244-136-a-0030">Simon</w>
  <w join="left" lemma="be" pos="vvz" xml:id="A60244-136-a-00-c">'s</w>
  <w lemma="book" pos="n1" xml:id="A60244-136-a-0050">Book</w>

where 00-c is clearly supposed to be 0040.

Comments (2)

  1. Craig Berry reporter

    Some of the worst offenders have been fixed. Here’s the current status:

    $ find . -name '*.xml' -exec grep -H -c '\-00\-' {} \; | perl -ne 'print $_ unless $_ =~ m/:0$/;'
    ./A33/A33998.xml:1
    ./A32/A32160.xml:1
    ./A43/A43846.xml:2
    ./A37/A37461.xml:19
    ./A52/A52120.xml:2
    ./A36/A36791.xml:7
    ./A53/A53062.xml:2
    ./A91/A91584.xml:1
    ./A85/A85670.xml:21
    

  2. Log in to comment