HTML input with supplemental chars not supported

Issue #30 resolved
Former user created an issue

Original [issue 30](https://code.google.com/p/okapi/issues/detail?id=30) created by @ysavourel on 2009-03-21T02:47:25.000Z:

It seems supplemental characters are not supported by Jericho: I've added this test as well as a supplemental.html file in the test folder.

I'll look more closely at the skeleton writer, but output to XLIFF (which does not use the skeleton writer also is losing the chars.

public void testSupplementalSupport() { String snippet = "<p>[&\#x20000;]=U+D840,U+DC00</p>"; assertEquals("<p>[
uD840,
uDC00]=U+D840,U+DC00</p>", generateOutput(getEvents(snippet), snippet, "en")); }

Comments (6)

  1. Former user Account Deleted
    • changed status to open

    Comment [5.](https://code.google.com/p/okapi/issues/detail?id=30#c5) originally posted by @ysavourel on 2009-03-30T21:53:29.000Z:

    The snippet test case passes for me. I did have to update the test case as it had a comma that I didn't think should be there.

    [&\#x20000;] = [
    uD840
    uDC00]

    The entity should be converted to a surrogate pair Correct? The above test does pass, which looks correct to me.

  2. Log in to comment