White Space Correction Step

Issue #707 resolved
Dale Eggett created an issue

The White Space Correction Step currently trims any trailing white space that matches Character.isWhitespace, in addition to nonbreaking space. In many use cases, several white space characters should be considered formatting (e.g. new lines, tabs), so trimming these characters is not desirable.

Proposal: Add a parameter that allows a definition (perhaps a regex) that defines what should be trimmed in the White Space Correction Step.

Comments (8)

  1. YvesS
    • changed version to M35
    • removed milestone

    +1. Regex sound flexible.

    (and just a note: we usually use Version to tell in which release the issue was found (so M35 here). and Milestone for the release when we resolve the issue.

  2. Chase Tingley

    Regex is flexible but pretty hard to use when you're stringing together a bunch of specific codepoints. (It would also mean that people could match non-whitespace, which may not be desirable.) I think there are 25 whitespace characters (I'm in a hurry so I'm trusting wikipedia on this), which would be an excessive number of checkboxes... is there some way we can split the difference?

  3. Dale Eggett reporter

    In most cases, I think just the space character would be sufficient, but better to have more options available, right? Maybe categories? Maybe something like this:

    • Space character

    • Other horizontal, breaking spaces

    • Nonbreaking spaces

    • Horizontal tabs

    • Vertical white space (e.g. new line, carriage return, vertical tab)

    This takes care of the usual case (It's at least my usual case), where you only need the space character, but it gives flexibility for other types of white space if needed.

  4. Jim Hargrave (OLD)

    I am working on this now and will implement the classes of white space characters listed above. The default will be to have all classes checked which emulates current behavior. Usually users will be able to turn off all classes they don't want.

  5. Log in to comment