BOM issue with XLIFF Filter in OmegaT plugin

Issue #364 resolved
Former user created an issue

Original issue 364 created by @ysavourel on 2013-09-02T05:01:55.000Z:

For some reason, the BOM seems not to be processed properly when using the XLIFFFilter from the OmegaT Filter plugin:

Note that the XMLStreamReader seems to be not the expected one (should be Woodstox from the dependencies)

85600: Info: Project loading start (LOG_DATAENGINE_LOAD_START)
85600: Error: Failed to parse XLIFF for ITS annotations
85600: Error: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
85600: Error: Message: Content is not allowed in prolog.
85600: Error: at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:598)
85600: Error: at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(XMLEventReaderImpl.java:83)
85600: Error: at net.sf.okapi.filters.xliff.its.ITSStandoffManager.parseXLIFF(ITSStandoffManager.java:98)
85600: Error: at net.sf.okapi.filters.xliff.XLIFFITSFilterExtension.parseInDocumentITSStandoff(XLIFFITSFilterExtension.java:76)
85600: Error: at net.sf.okapi.filters.xliff.XLIFFFilter.open(XLIFFFilter.java:288)
85600: Error: at net.sf.okapi.filters.xliff.XLIFFFilter.open(XLIFFFilter.java:242)
85600: Error: at net.sf.okapi.filters.xliff.XLIFFFilter.open(XLIFFFilter.java:237)
85600: Error: at net.sf.okapi.lib.omegat.AbstractOkapiFilter.processFile(AbstractOkapiFilter.java:253)
85600: Error: at net.sf.okapi.lib.omegat.AbstractOkapiFilter.parseFile(AbstractOkapiFilter.java:162)
85600: Error: at net.sf.okapi.lib.omegat.XLIFFFilter.parseFile(XLIFFFilter.java:25)
...
85600: Error: Failed to load specified project! (TF_LOAD_ERROR)

Comments (3)

  1. Former user Account Deleted
    • changed status to open

    Comment 1. originally posted by @ysavourel on 2013-09-02T05:59:58.000Z:

    The plugin doesn't use com.ctc.wstx.stax.WstxInputFactory but picks up the default XML reader of OmegaT.
    Changing to com.ctc.wstx.stax.WstxInputFactory explicitly seems to fix the problem.

    Two questions:
    a) why the default can't deal with UTF-8 BOM?

    b) can we safely change to force using com.ctc.wstx.stax.WstxInputFactory. This can be set through the parameters, but we would be changing (or fixing) the default. the initial intent seems to have been to use Woodstox by default.

  2. Former user Account Deleted

    Comment 3. originally posted by @ysavourel on 2013-09-04T05:03:27.000Z:

    I don't know if there's a good reason why the default implementation doesn't handle the BOM, except maybe consistency with all the other things in Java that don't handle BOMs either. It's actually pretty lousy that implementations don't handle (or not handle) it consistently.

  3. Log in to comment