XML Module not correctly handling NCR with leading 0

Issue #122 resolved
Richard Anderson created an issue

The Portico web page contains a number of numeric character references that look like this:

The character being represented is the ampersand

the problem is that I am using the Integer.decode(String) method against all numeric character references because sometimes the number is a decimal and sometimes the number is in hex format (preceded by an 'x')

In the Integer.decode method, the leading '0' is causing the number to be treated as a radix 8 (octal) number, which it most certainly is not.

Comments (3)

  1. Richard Anderson reporter
    • changed status to open

    Fixed by changeset 8776bf715d03

    org.jhove2.module.format.xml.NumericCharacterReferenceInformation#tally , Integer codePoint;

    if (code.substring(0,1).toLowerCase().equals("x")) {

    codePoint = Integer.decode(code.toLowerCase().replace("x", "0x"));

    } else { codePoint = new Integer(code); }

    I also wrapped this routine in a try/catch block so that any other NumberFormatException errors will get trapped and recorded as a Message

  2. Log in to comment