Merged in mihai_UnicodeBOMWriter (pull request #524)
bbf772c·Author: Mihai Nita·Closed by: Mihai Nita·2021-06-18
Description
After doing all the work without looking at the old code I’ve discovered some incorrect behavior in the old implementation.
For example if you use something like “Utf-8” the old code fails, because it probably does some case sensitive compares.
Also, the old API was badly designed, because there is a constructor that takes an OutputStream. This means that there is no guarantee that someone didn’t write in it already, so it writes a BOM in the middle of the file.
We only use this class once, in net.sf.okapi.filters.html.StreamedSourceCopy.java and in somewhat “weird” way:
if (hasBOM) {
writer = new UnicodeBOMWriter(new FileOutputStream(new File(
tempUri.getPath())), encoding);
} else { ... }
So we create an OutputStream although there is a constructor that takes a String filename directly. So we don’t even use that badly designed API.
TLDR: I propose to stop trying to be 100% backward compatible and remove the bad constructors. Drop support for PROPERTY_WRITE_UTF8_BOM System property, which only complicates the code.
Should we worry about other projects using this class? It is public…
Option 1: submit this “as is”, backward compatible, mistakes and all. 1.1 Keep it as is “forever” 1.2 Mark some items as “deprecated” and remove them later 1.3 Remove them in a CL asap (same release)
Option 2: refactor and cleanup even before this PR is merged, to hell with back-compat :-)
After doing all the work without looking at the old code I’ve discovered some incorrect behavior in the old implementation.
For example if you use something like “Utf-8” the old code fails, because it probably does some case sensitive compares.
Also, the old API was badly designed, because there is a constructor that takes an OutputStream.
This means that there is no guarantee that someone didn’t write in it already, so it writes a BOM in the middle of the file.
We only use this class once, in
net.sf.okapi.filters.html.StreamedSourceCopy.java
and in somewhat “weird” way:if (hasBOM) { writer = new UnicodeBOMWriter(new FileOutputStream(new File( tempUri.getPath())), encoding); } else { ... }
So we create an OutputStream although there is a constructor that takes a String filename directly. So we don’t even use that badly designed API.
TLDR: I propose to stop trying to be 100% backward compatible and remove the bad constructors.
Drop support for
PROPERTY_WRITE_UTF8_BOM
System property, which only complicates the code.Should we worry about other projects using this class?
It is public…
Option 1: submit this “as is”, backward compatible, mistakes and all.
1.1 Keep it as is “forever”
1.2 Mark some items as “deprecated” and remove them later
1.3 Remove them in a CL asap (same release)
Option 2: refactor and cleanup even before this PR is merged, to hell with back-compat :-)