Rainbow: CSV filter always replaces line breaks with spaces, no matter what the filter setting is
I have a CSV file with text qualifiers. Some texts include line breaks. The source file looks like this:
"First sentence. Second sentence without full stop
Third sentence."
"<ul>Begin
<li>First line</li>
<li>Sencond line</li>
</ul>"
Please find it also attached.
After applying Ratel segementation rules and opening it in OmegaT, it looks like on the screenshot.
Changing the options "Multi-line text units" on the filter menu doesn't change the behavior.
It seems that all line breaks within one "cell" are replaced by a space. Thats' probably the reason, why the segmentation process is not able to create segments after line breaks. Overall, this results in large, unsplit segments.
Isn't the filter working right? Am I doing something wrong?
I am using version 6.0.27 with Java 1.8.0_131 on Windows 7.
Side information: I am using a regexp codefinder rule to extract html elements. But this doesn't matter here, since also text without html elements is affected.
(\x20*<\/?.*?[^(->)]>\x20*)|(\s*\{.*?\}\s*)
Comments (4)
-
reporter -
I can confirm the behavior you described. Looking at the option for this filter, The way line-breaks are handled is likely set by the Multi-line text units option. And it seems that changing the option to either use \n or replace by inline codes are not taken into account: it looks like the option "unwrap lines" is applied no matter what.
I'll try to have a look at it, at least to see if it's a bug or that is because another option somewhere else affect this one.
-
The cause seems to be that there is a parameter to preserve or not white-spaces in the PlainText Filter, and that parameter is set to false by default. Because the Table Filter is derived from the PlainText Filter that code is called (See https://bitbucket.org/okapiframework/okapi/src/master/okapi/filters/plaintext/src/main/java/net/sf/okapi/filters/plaintext/base/BasePlainTextFilter.java?fileviewer=file-view-default#BasePlainTextFilter.java-261) and the line-breaks get changed to spaces.
The Table Filter does not seem to expose the PlainText parameter for preserving or not the white-space. We would need to either expose it, or force that PlainText Filter parameter when setting the parameters for the Table Filter.
-
reporter Are there any plans to fix this behaviour?
- Log in to comment
Can anybody confirm this bug or am I doing something wrong?