fix encoding issue with umlauts, etc. that get lost in the encoding process
No description provided.
Comments (10)
-
-
Thanks for the pointer! I’ll update the docker image. (I did intentionally not touch it since there were versions of mmark that introduced new bugs, but it seems it’s time to finally update.)
-
The first attempt with the new version didn’t go well. I’m investigating.
-
-
assigned issue to
-
assigned issue to
-
Ok, so this is the branch where I ran my experiments: https://bitbucket.org/openid/fapi/branch/danielfett/fapi2/fix-unicode
It seems that unicode characters without the number expansion are not allowed at all in the main body of the text with some exceptions listed here: https://authors.ietf.org/en/non-ascii-characters-in-rfcxml
It seems that the contact element is what is normally used to render acknowledgements: https://github.com/mmarkdown/mmark/blob/master/Syntax.md
However, this renders quite differently from what we have today and seems to ignore any text before or between author names:
Does anybody have any other ideas how to solve this?
-
Actually, RFC 7997 does allow for Unicode characters without the number extension in the main body of the text, in particular names, see Section 3.2.
The issue, however, seems to be that the toolchain mmark + xml2rfc does not support these cases (neither names as of Section 3.2 in RFC 7997, nor other cases in which the number expansion is not needed).According to the RFCXML Vocabulary, references to contacts can be used inline within a
<t>
. I wasn’t able to reproduce Daniel’s build, but it may be worth checking the generated XML: Is there a<t>
around the paragraph? If not, I’d say this is a bug in mmark. However, if there is a<t>
, I’d say it is a bug in rfc2xml (because the RFCXML vocabulary says that inline use should “work” in this case). -
Indeed, the <t> was missing - probably a bug by mmark. Adding a paragraph before the paragraph containing the references helps.
Fixed in PR #408 https://bitbucket.org/openid/fapi/pull-requests/408
Filed a bug in mmark: https://github.com/mmarkdown/mmark/issues/183
-
After spending way too much time on this, it was pointed out to me that the latest xml2rfc version from a few days ago permits unicode characters without the use of the <contact> element (see changelog at https://github.com/ietf-tools/xml2rfc/blob/main/CHANGELOG.md). The mmark version 2.2.31 now supports this as well.
I updated the docker container danielfett/markdown2rfc to use updated versions and it seems to compile our main branch just fine now.
We don’t need the pull request anymore and can close this issue.
-
Nice, that’s a much better solution, thank you Daniel.
I got Bitbucket to rerun the pipeline for master, and can confirm the generated working draft ( https://openid.bitbucket.io/fapi/fapi-2_0-security-profile.html ) now shows the problem characters correctly.
I did a quick scan and didn’t notice any other significant differences in the output, though compared to the previous documents (e.g. https://openid.net/specs/fapi-2_0-security-profile-ID2.html ) the ‘Internet-Draft:’ and ‘Intended Status:’ are now not there. That’s probably okay though.
-
reporter - changed status to resolved
Issue resolved
- Log in to comment
This can probably be fixed by using a more recent version of mmarkdown/mmark in danielfett/markdown2rfc, mmarkdown/mmark got unicode support in this commit, i.e., since version 2.2.16, whereas danielfett/markdown2rfc currently uses version 2.1.1. @Daniel Fett I wasn’t able to locate a repository for the docker image to file a PR; if needed, I’d be willing to help.