Markdown: extraction preserve spaces
Taking the text here and extracting it with tikal -x:
The line break here
becomes a space, but the one here
and here \
should be preserved.
(two spaces after the “one here”)
The result is:
<trans-unit id="tu10" xml:space="preserve">
<source xml:lang="en">The line break here
becomes a space, but the one here<x id="1"/>
and here <x id="2"/>
should be preserved.</source>
</trans-unit>
So the newlines are saved as newlines, and the whole trans-unit has xml:space=”preserve”
I expected something similar to html, where newline becomes a space (inside the same trans-unit)
That is how markdown is rendered.
html only adds xml:space=”preserve” to trans-units extracted from <pre>
HTML in:
<p>The line break here
becomes a space, but the one here<br>
should be preserved.</p>
HTML out as XLIFF
<trans-unit id="tu12" restype="x-paragraph">
<source xml:lang="en">The line break here becomes a space, but the one here<x id="1"/> should be preserved.</source>
</trans-unit>
I think that behavior makes more sense.
Thanks,
Mihai
Comments (2)
-
-
reporter Yes, and I agree that space preserve is a good default.
But the way markdown is extracted now is inconsistent with how html is extracted, and I think it exposes the translators to the markdown conventions.
html:
<p>This is some longer line with nl in random places that is rendered at runtime into a single line with collapsed spaces.</p>
extracts as:
<trans-unit id="tu3" restype="x-paragraph"> <source xml:lang="en">This is some longer line with nl in random places that is rendered at runtime into a single line with collapsed spaces.</source> </trans-unit>
the equivalent markdown:
This is some longer line with nl in random places that is rendered at runtime into a single line with collapsed spaces.
extracts as:
<trans-unit id="tu3" xml:space="preserve"> <source xml:lang="en">This is some longer line with nl in random places that is rendered at runtime into a single line with collapsed spaces.</source> </trans-unit>
As an experienced translator (or a translator with a tool that has a “live preview”), between space:preserve and seeing all the newlines, I 100% expect that the newlines matter.
So I will try to match them, or remove them, or move them around where it makes more sense.
And if at some point someone files a bug asking saying that lines break in the wrong places, I will be puzzled, because I can see a line breaks, and they are where they should be.If (for some reason) I use several spaces at the end of a line, or the beginning of one, the resulting
.md
has forced line breaks, or even code paragraphs.
That is what I mean by “exposes the translators to the markdown conventions“
TLDR: I would expect both examples to extract the same. They don’t render wrapped, and spaces don’t matter.
So unwrap markdown lines what are unwrap in rendering.
And even remove the space:preserve, because that attribute has it’s own place. It does not have to be blindly applied, only where it makes sense.
- Log in to comment
Note that we decided a while back to force any extracted xliff files textunits to xml:space=”preserve” when we merge (OriginalDocumentXliffMergerStep). We do this because a container format should always preserve the original format as generated by the filter.