Clean room implementation of UnicodeBOMWriter

Merged
#524 · Created  · Last updated

Merged pull request

Merged in mihai_UnicodeBOMWriter (pull request #524)

bbf772c·Author: ·Closed by: ·2021-06-18

Description

After doing all the work without looking at the old code I’ve discovered some incorrect behavior in the old implementation.

For example if you use something like “Utf-8” the old code fails, because it probably does some case sensitive compares.

Also, the old API was badly designed, because there is a constructor that takes an OutputStream.
This means that there is no guarantee that someone didn’t write in it already, so it writes a BOM in the middle of the file.

We only use this class once, in net.sf.okapi.filters.html.StreamedSourceCopy.java and in somewhat “weird” way:

if (hasBOM) { writer = new UnicodeBOMWriter(new FileOutputStream(new File( tempUri.getPath())), encoding); } else { ... }

So we create an OutputStream although there is a constructor that takes a String filename directly. So we don’t even use that badly designed API.


TLDR: I propose to stop trying to be 100% backward compatible and remove the bad constructors.
Drop support for PROPERTY_WRITE_UTF8_BOM System property, which only complicates the code.

Should we worry about other projects using this class?
It is public…

Option 1: submit this “as is”, backward compatible, mistakes and all.
1.1 Keep it as is “forever”
1.2 Mark some items as “deprecated” and remove them later
1.3 Remove them in a CL asap (same release)

Option 2: refactor and cleanup even before this PR is merged, to hell with back-compat :-)

 

 

 

 

0 attachments

0 comments

Loading commits...