Clean room implementation of UnicodeBOMWriter

Branch: mihai_UnicodeBOMWriter

Branch: dev

Merged

#524 · Created 2021-06-10 · Last updated 2021-06-18

Merged pull request

Merged in mihai_UnicodeBOMWriter (pull request #524)

bbf772c·Author: Mihai Nita·Closed by: Mihai Nita·2021-06-18

Description

After doing all the work without looking at the old code I’ve discovered some incorrect behavior in the old implementation.

For example if you use something like “Utf-8” the old code fails, because it probably does some case sensitive compares.

Also, the old API was badly designed, because there is a constructor that takes an OutputStream.
This means that there is no guarantee that someone didn’t write in it already, so it writes a BOM in the middle of the file.

We only use this class once, in net.sf.okapi.filters.html.StreamedSourceCopy.java and in somewhat “weird” way:

if (hasBOM) {
    writer = new UnicodeBOMWriter(new FileOutputStream(new File(
        tempUri.getPath())), encoding);
} else { ... }

So we create an OutputStream although there is a constructor that takes a String filename directly. So we don’t even use that badly designed API.

TLDR: I propose to stop trying to be 100% backward compatible and remove the bad constructors.
Drop support for PROPERTY_WRITE_UTF8_BOM System property, which only complicates the code.

Should we worry about other projects using this class?
It is public…

Option 1: submit this “as is”, backward compatible, mistakes and all.
1.1 Keep it as is “forever”
1.2 Mark some items as “deprecated” and remove them later
1.3 Remove them in a CL asap (same release)

Option 2: refactor and cleanup even before this PR is merged, to hell with back-compat :-)

Clean room implementation of UnicodeBOMWriter

Merged pull request

Description

0 attachments

0 comments

Loading commits...