HTML Filter: Extremely messy structure of extracted file when preserve_option set to true

Create issue
Issue #1078 resolved
Handika Dwi created an issue

This advances this issue: https://bitbucket.org/okapiframework/okapi/issues/1076/html-filter-some-structures-inside-text

You can take a look at the Text Units structure.

Comments (8)

  1. Chase Tingley

    Can you expect what behavior you would expect to see? You are enabling the “preserve whitespace” option, and the filter is preserving it. The source doc has a lot of whitespace, so this produces segments with lots of newlines, etc.

  2. Handika Dwi reporter

    What I know about the preserve_whitespace option is set to true is that it will be reflected in the merged file which eventually will preserve the formatting as much as the original file

  3. Handika Dwi reporter

    Anyway, what I expect is don’t produce the newlines so that it’s human-readable

    Couldn’t you see it’s really ugly? To me, it is

  4. YvesS

    To be able to preserve the whitespace characters (which includes the new lines) in the output, they have to be in the extracted text.

    Ugliness, like beauty, is a matter of debate 🙂
    Also, XLIFF files are meant to be consumed by some kind of editor that can do whatever it wants with the formatting.
    The only thing the filter can do is extract the content that exists, preserving or not whitespace characters.

  5. Log in to comment