Markdown: extraction preserve spaces

Mihai Nita reporter

Yes, and I agree that space preserve is a good default.

But the way markdown is extracted now is inconsistent with how html is extracted, and I think it exposes the translators to the markdown conventions.

html:

<p>This is some longer line
with nl in random places 
that is rendered at runtime 
into a single line with 
collapsed spaces.</p>

extracts as:

<trans-unit id="tu3" restype="x-paragraph">
<source xml:lang="en">This is some longer line with nl in random places that is rendered at runtime into a single line with collapsed spaces.</source>
</trans-unit>

the equivalent markdown:

This is some longer line
with nl in random places 
that is rendered at runtime 
into a single line with 
collapsed spaces.

extracts as:

<trans-unit id="tu3" xml:space="preserve">
<source xml:lang="en">This is some longer line
with nl in random places
that is rendered at runtime
into a single line with
collapsed spaces.</source>
</trans-unit>

As an experienced translator (or a translator with a tool that has a “live preview”), between space:preserve and seeing all the newlines, I 100% expect that the newlines matter.

So I will try to match them, or remove them, or move them around where it makes more sense.
And if at some point someone files a bug asking saying that lines break in the wrong places, I will be puzzled, because I can see a line breaks, and they are where they should be.

If (for some reason) I use several spaces at the end of a line, or the beginning of one, the resulting .md has forced line breaks, or even code paragraphs.
That is what I mean by “exposes the translators to the markdown conventions“

TLDR: I would expect both examples to extract the same. They don’t render wrapped, and spaces don’t matter.

So unwrap markdown lines what are unwrap in rendering.

And even remove the space:preserve, because that attribute has it’s own place. It does not have to be blindly applied, only where it makes sense.

2023-02-18T22:37:18+00:00

Comments (2)

jhargrave-straker
Note that we decided a while back to force any extracted xliff files textunits to xml:space=”preserve” when we merge (OriginalDocumentXliffMergerStep). We do this because a container format should always preserve the original format as generated by the filter.
- 2023-02-16T20:03:55+00:00
Mihai Nita reporter
Yes, and I agree that space preserve is a good default.

But the way markdown is extracted now is inconsistent with how html is extracted, and I think it exposes the translators to the markdown conventions.

html:
```
<p>This is some longer line
with nl in random places 
that is rendered at runtime 
into a single line with 
collapsed spaces.</p>
```
extracts as:
```
<trans-unit id="tu3" restype="x-paragraph">
<source xml:lang="en">This is some longer line with nl in random places that is rendered at runtime into a single line with collapsed spaces.</source>
</trans-unit>
```
the equivalent markdown:
```
This is some longer line
with nl in random places 
that is rendered at runtime 
into a single line with 
collapsed spaces.
```
extracts as:
```
<trans-unit id="tu3" xml:space="preserve">
<source xml:lang="en">This is some longer line
with nl in random places
that is rendered at runtime
into a single line with
collapsed spaces.</source>
</trans-unit>
```
As an experienced translator (or a translator with a tool that has a “live preview”), between space:preserve and seeing all the newlines, I 100% expect that the newlines matter.

So I will try to match them, or remove them, or move them around where it makes more sense.
And if at some point someone files a bug asking saying that lines break in the wrong places, I will be puzzled, because I can see a line breaks, and they are where they should be.

If (for some reason) I use several spaces at the end of a line, or the beginning of one, the resulting .md has forced line breaks, or even code paragraphs.
That is what I mean by “exposes the translators to the markdown conventions“

TLDR: I would expect both examples to extract the same. They don’t render wrapped, and spaces don’t matter.

So unwrap markdown lines what are unwrap in rendering.

And even remove the space:preserve, because that attribute has it’s own place. It does not have to be blindly applied, only where it makes sense.
- 2023-02-18T22:37:18+00:00
Log in to comment