fix encoding issue with umlauts, etc. that get lost in the encoding process

Issue #573 resolved
Dave Tonge created an issue

No description provided.

Comments (10)

  1. Daniel Fett

    Thanks for the pointer! I’ll update the docker image. (I did intentionally not touch it since there were versions of mmark that introduced new bugs, but it seems it’s time to finally update.)

  2. Daniel Fett

    Ok, so this is the branch where I ran my experiments: https://bitbucket.org/openid/fapi/branch/danielfett/fapi2/fix-unicode

    It seems that unicode characters without the number expansion are not allowed at all in the main body of the text with some exceptions listed here: https://authors.ietf.org/en/non-ascii-characters-in-rfcxml

    It seems that the contact element is what is normally used to render acknowledgements: https://github.com/mmarkdown/mmark/blob/master/Syntax.md

    However, this renders quite differently from what we have today and seems to ignore any text before or between author names:

    Does anybody have any other ideas how to solve this?

  3. Tim Würtele

    Actually, RFC 7997 does allow for Unicode characters without the number extension in the main body of the text, in particular names, see Section 3.2.
    The issue, however, seems to be that the toolchain mmark + xml2rfc does not support these cases (neither names as of Section 3.2 in RFC 7997, nor other cases in which the number expansion is not needed).

    According to the RFCXML Vocabulary, references to contacts can be used inline within a <t>. I wasn’t able to reproduce Daniel’s build, but it may be worth checking the generated XML: Is there a <t> around the paragraph? If not, I’d say this is a bug in mmark. However, if there is a <t>, I’d say it is a bug in rfc2xml (because the RFCXML vocabulary says that inline use should “work” in this case).

  4. Daniel Fett

    After spending way too much time on this, it was pointed out to me that the latest xml2rfc version from a few days ago permits unicode characters without the use of the <contact> element (see changelog at https://github.com/ietf-tools/xml2rfc/blob/main/CHANGELOG.md). The mmark version 2.2.31 now supports this as well.

    I updated the docker container danielfett/markdown2rfc to use updated versions and it seems to compile our main branch just fine now.

    We don’t need the pull request anymore and can close this issue.

  5. Joseph Heenan

    Nice, that’s a much better solution, thank you Daniel.

    I got Bitbucket to rerun the pipeline for master, and can confirm the generated working draft ( https://openid.bitbucket.io/fapi/fapi-2_0-security-profile.html ) now shows the problem characters correctly.

    I did a quick scan and didn’t notice any other significant differences in the output, though compared to the previous documents (e.g. https://openid.net/specs/fapi-2_0-security-profile-ID2.html ) the ‘Internet-Draft:’ and ‘Intended Status:’ are now not there. That’s probably okay though.

  6. Log in to comment