HTML Filter: Extremely messy structure of extracted file when preserve_option set to true

Issue #1078 resolved

Handika Dwi created an issue 2021-07-16

This advances this issue: https://bitbucket.org/okapiframework/okapi/issues/1076/html-filter-some-structures-inside-text

You can take a look at the Text Units structure.

Comments (8)

Handika Dwi reporter
- changed title to HTML Filter: Extremely messy structure of extracted file
- 2021-07-16T14:01:48+00:00
Handika Dwi reporter
- changed title to HTML Filter: Extremely messy structure of extracted file when preserve_option set to true
- 2021-07-16T14:21:58+00:00
Chase Tingley
Can you expect what behavior you would expect to see? You are enabling the “preserve whitespace” option, and the filter is preserving it. The source doc has a lot of whitespace, so this produces segments with lots of newlines, etc.
- 2021-07-27T21:58:59+00:00
Handika Dwi reporter
What I know about the preserve_whitespace option is set to true is that it will be reflected in the merged file which eventually will preserve the formatting as much as the original file
- 2021-07-28T09:08:34+00:00
Handika Dwi reporter
Anyway, what I expect is don’t produce the newlines so that it’s human-readable

Couldn’t you see it’s really ugly? To me, it is
- 2021-07-28T15:22:33+00:00
YvesS
To be able to preserve the whitespace characters (which includes the new lines) in the output, they have to be in the extracted text.

Ugliness, like beauty, is a matter of debate
Also, XLIFF files are meant to be consumed by some kind of editor that can do whatever it wants with the formatting.
The only thing the filter can do is extract the content that exists, preserving or not whitespace characters.
- 2021-07-28T16:11:48+00:00
Handika Dwi reporter
okay. that makes sense
- 2021-07-30T03:22:50+00:00
YvesS
- changed status to resolved
- 2021-07-30T04:26:03+00:00
Log in to comment

Assignee: –

Type: bug

Priority: major

Status: resolved

Milestone: 1.41.0

Version: 1.41.0

Votes: 0

Watchers: 1