- edited description
Markdown: extracted XLIFF has `
` on every line of source/target in XLIFF for code blocks when source file has CR/LF (DOS) ending
When extracting a Markdown file with code blocks (indented or fenced) with a DOS/Windows type CR/LF ending, the numeric entity for CR (`&;`) is found at the end of every line except the last for the code blocks in the generated XLIFF file. The same symptom is observed for hard line breaks after fixing issue #13#695.
This seems to suggest the filter is not following the Developer Guide’s recommendation on the Line Break where the end-of-line should be normalized to LF regardless of the platform or the input file.
This happens even when tikal is run on Windows.
Comments (6)
-
reporter -
reporter When extracting a Markdown file with code blocks (indented or fenced) with a DOS/Windows type CR/LF ending, the numeric entity for CR (`&
;`) is found at the end of every line except the last for the code blocks in the generated XLIFF file. The same symptom is observed for hard line breaks after fixing issue#13#695.This is reproducible with M37. This happens even when tikal is run on Windows.
-
reporter Pull request #313 has been made.
This is fixed by using the DefaultEncoder rather than the MarkdownEncoder that was placed to prevent test failures on Windows. MarkdownEncoder is basically an no-op encoder that keeps the SkeletonWriter (GenericSkeletonWriter) from adjusting the line endings to the type of the original document. It was probably needed because the line ending was treated as a code. This fix treat the line break as the normal line ending, i.e. LF-only in TextUnit. A good side-effect is that after this fix, the code block like this:
``` public void foo() { do_something(); } ```
will result in this source element in XLIFF:
<source xml:lang="en"><x id="1"/>public void foo() { <x id="2"/> do_something(); <x id="3"/>} </source>
rather than the previous:
<source xml:lang="en"><x id="1"/>public void foo() {<x id="2"/><x id="3"/> do_something();<x id="4"/><x id="5"/>}<x id="6"/></source>
which is very hard to read and understand.
Even after this fix, the reconstructed (extracted+merged) document of code-blocks-crlf.md is different from the original file; it does not retain the extra spaces before the code block when the fence has extra spaces. This should not be a real problem because they are semantically equivalent. But if that is a problem, it should be addressed as a new issue.
-
reporter - changed status to resolved
Fixing issue
#820. Almost OK except when the indented code blocks exist. One unit test case failing.→ <<cset 3abda3a8e9f9>>
-
reporter Fixing issue
#820. Changed handling of indented code blocks. Newlines are no longer converted to codes.→ <<cset dc42b4fe45df>>
-
reporter Merged in ssikuro/okapi/fix_issue_820 (pull request #313)
Fix issue 820
Approved-by: Mihai Nita mihnita@gmail.com
→ <<cset cb0ebb9e817c>>
- Log in to comment
When extracting a Markdown file with code blocks (indented or fenced) with a DOS/Windows type CR/LF ending, the numeric entity for CR (` `) is found at the end of every line for the code blocks in the generated XLIFF file.
This is reproducible with M37. This happens even when tikal is run on Windows.