Rainbow: CSV filter always replaces line breaks with spaces, no matter what the filter setting is

Issue #673 new
Sebastian Ebert created an issue

I have a CSV file with text qualifiers. Some texts include line breaks. The source file looks like this:

"First sentence. Second sentence without full stop
Third sentence."
"<ul>Begin
<li>First line</li>
<li>Sencond line</li>
</ul>"

Please find it also attached.

After applying Ratel segementation rules and opening it in OmegaT, it looks like on the screenshot.

Changing the options "Multi-line text units" on the filter menu doesn't change the behavior.

It seems that all line breaks within one "cell" are replaced by a space. Thats' probably the reason, why the segmentation process is not able to create segments after line breaks. Overall, this results in large, unsplit segments.

Isn't the filter working right? Am I doing something wrong?

I am using version 6.0.27 with Java 1.8.0_131 on Windows 7.

Side information: I am using a regexp codefinder rule to extract html elements. But this doesn't matter here, since also text without html elements is affected.

(\x20*<\/?.*?[^(->)]>\x20*)|(\s*\{.*?\}\s*)

Comments (4)

  1. YvesS

    I can confirm the behavior you described. Looking at the option for this filter, The way line-breaks are handled is likely set by the Multi-line text units option. And it seems that changing the option to either use \n or replace by inline codes are not taken into account: it looks like the option "unwrap lines" is applied no matter what.

    I'll try to have a look at it, at least to see if it's a bug or that is because another option somewhere else affect this one.

  2. YvesS

    The cause seems to be that there is a parameter to preserve or not white-spaces in the PlainText Filter, and that parameter is set to false by default. Because the Table Filter is derived from the PlainText Filter that code is called (See https://bitbucket.org/okapiframework/okapi/src/master/okapi/filters/plaintext/src/main/java/net/sf/okapi/filters/plaintext/base/BasePlainTextFilter.java?fileviewer=file-view-default#BasePlainTextFilter.java-261) and the line-breaks get changed to spaces.

    The Table Filter does not seem to expose the PlainText parameter for preserving or not the white-space. We would need to either expose it, or force that PlainText Filter parameter when setting the parameters for the Table Filter.

  3. Log in to comment