fix encoding issue with umlauts, etc. that get lost in the encoding process

Tim Würtele

This can probably be fixed by using a more recent version of mmarkdown/mmark in danielfett/markdown2rfc, mmarkdown/mmark got unicode support in this commit, i.e., since version 2.2.16, whereas danielfett/markdown2rfc currently uses version 2.1.1. @Daniel Fett I wasn’t able to locate a repository for the docker image to file a PR; if needed, I’d be willing to help.

2023-01-25T15:25:36+00:00

Daniel Fett

Thanks for the pointer! I’ll update the docker image. (I did intentionally not touch it since there were versions of mmark that introduced new bugs, but it seems it’s time to finally update.)

2023-01-25T15:32:07+00:00

Daniel Fett

The first attempt with the new version didn’t go well. I’m investigating.

‌

2023-01-25T15:44:06+00:00

Daniel Fett

assigned issue to

Daniel Fett

2023-01-25T15:54:59+00:00

Daniel Fett

Ok, so this is the branch where I ran my experiments: https://bitbucket.org/openid/fapi/branch/danielfett/fapi2/fix-unicode

It seems that unicode characters without the number expansion are not allowed at all in the main body of the text with some exceptions listed here: https://authors.ietf.org/en/non-ascii-characters-in-rfcxml

It seems that the contact element is what is normally used to render acknowledgements: https://github.com/mmarkdown/mmark/blob/master/Syntax.md

However, this renders quite differently from what we have today and seems to ignore any text before or between author names:

Does anybody have any other ideas how to solve this?

2023-01-25T16:12:37+00:00

Tim Würtele

Actually, RFC 7997 does allow for Unicode characters without the number extension in the main body of the text, in particular names, see Section 3.2.
The issue, however, seems to be that the toolchain mmark + xml2rfc does not support these cases (neither names as of Section 3.2 in RFC 7997, nor other cases in which the number expansion is not needed).

According to the RFCXML Vocabulary, references to contacts can be used inline within a <t>. I wasn’t able to reproduce Daniel’s build, but it may be worth checking the generated XML: Is there a <t> around the paragraph? If not, I’d say this is a bug in mmark. However, if there is a <t>, I’d say it is a bug in rfc2xml (because the RFCXML vocabulary says that inline use should “work” in this case).

2023-01-26T12:39:03+00:00

Daniel Fett

Indeed, the <t> was missing - probably a bug by mmark. Adding a paragraph before the paragraph containing the references helps.

Fixed in PR #408 https://bitbucket.org/openid/fapi/pull-requests/408

Filed a bug in mmark: https://github.com/mmarkdown/mmark/issues/183

2023-02-08T09:19:59+00:00

Daniel Fett

After spending way too much time on this, it was pointed out to me that the latest xml2rfc version from a few days ago permits unicode characters without the use of the <contact> element (see changelog at https://github.com/ietf-tools/xml2rfc/blob/main/CHANGELOG.md). The mmark version 2.2.31 now supports this as well.

I updated the docker container danielfett/markdown2rfc to use updated versions and it seems to compile our main branch just fine now.

We don’t need the pull request anymore and can close this issue.

2023-02-08T15:21:50+00:00

Joseph Heenan

Nice, that’s a much better solution, thank you Daniel.

I got Bitbucket to rerun the pipeline for master, and can confirm the generated working draft ( https://openid.bitbucket.io/fapi/fapi-2_0-security-profile.html ) now shows the problem characters correctly.

I did a quick scan and didn’t notice any other significant differences in the output, though compared to the previous documents (e.g. https://openid.net/specs/fapi-2_0-security-profile-ID2.html ) the ‘Internet-Draft:’ and ‘Intended Status:’ are now not there. That’s probably okay though.

2023-02-08T15:44:30+00:00

Dave Tonge reporter

changed status to resolved

Issue resolved

2023-03-08T14:58:35+00:00

Comments (10)