- changed title to HTML Filter: Extremely messy structure of extracted file
HTML Filter: Extremely messy structure of extracted file when preserve_option set to true
This advances this issue: https://bitbucket.org/okapiframework/okapi/issues/1076/html-filter-some-structures-inside-text
You can take a look at the Text Units structure.
Comments (8)
-
reporter -
reporter - changed title to HTML Filter: Extremely messy structure of extracted file when preserve_option set to true
-
Can you expect what behavior you would expect to see? You are enabling the “preserve whitespace” option, and the filter is preserving it. The source doc has a lot of whitespace, so this produces segments with lots of newlines, etc.
-
reporter What I know about the
preserve_whitespace
option is set totrue
is that it will be reflected in the merged file which eventually will preserve the formatting as much as the original file -
reporter Anyway, what I expect is don’t produce the newlines so that it’s human-readable
Couldn’t you see it’s really ugly? To me, it is
-
To be able to preserve the whitespace characters (which includes the new lines) in the output, they have to be in the extracted text.
Ugliness, like beauty, is a matter of debate
Also, XLIFF files are meant to be consumed by some kind of editor that can do whatever it wants with the formatting.
The only thing the filter can do is extract the content that exists, preserving or not whitespace characters. -
reporter okay. that makes sense
-
- changed status to resolved
- Log in to comment