Non-auto-detected encodings not taken in account in XLIFFFilter

Issue #99 open
Former user created an issue

Original issue 99 created by @ysavourel on 2009-07-26T22:18:06.000Z:

When an XLIFF document has an encoding different from one auto-detected by
the BomandEncoding detector, it is not set properly. The XML stream reader
uses the one define for the Reader used in open(), but that one is the one
from the auto-detector and may be wrong.

It seems we need to pass the rawdocument.stream to the open method, not
use getReader(), but it seems we cannot because it has been used by the
auto-detector.

Simple test: use a Windows-1252 xliff with some extended chars. they will
come as Windows-1252 read as UTF-8.

Comments (3)

  1. Jim Hargrave (OLD)
    • edited description
    • removed responsible

    // Determine encoding based on BOM, if any
    input.setEncoding(ENCODING.name()); // Default for XML, other should be auto-detected
    BOMNewlineEncodingDetector detector = new BOMNewlineEncodingDetector(input.getStream(), input.getEncoding());
    detector.detectBom();
    
    String inStreamCharset = ENCODING.name();
    if ( detector.isAutodetected() ) {
        inStreamCharset = detector.getEncoding();
    }
    

    We shouldn’t override the passed in encoding (input.setEncoding(ENCODING.name());). Code should look like this:

    BOMNewlineEncodingDetector detector = new BOMNewlineEncodingDetector(input.getStream(), input.getEncoding());
    detector.detectBom();
    String encoding = detector.getEncoding(); // return the passed in encoding - or the one detected using the BOM.
    // should log warning if they differ
    

  2. Log in to comment