java.util.EmptyStackException if target has literal escaped g tags

Issue #1158 resolved
László Laki created an issue

Hi,

I have an issue with the Leverage Files from Moses part of the script. I try to insert this target to my xliff.

target.txt

<g>a</g>

command:

okapi/tikal.sh -lm sourceTikal.xlf -fc okf_xliff -sl en -tl de -ie UTF-8 -oe UTF-8 -trace -totrg -overtrg -noalttrans -from target.txt -to targetTikal.xlf

result:

-------------------------------------------------------------------------------
Okapi Tikal - Localization Toolset
Version: 2.1.43.0
-------------------------------------------------------------------------------
1 class net.sf.okapi.applications.tikal.Main
2 ProtectionDomain  (file:/okapi/lib/okapi-application-tikal-1.43.0.jar <no signer certificates>)
 jdk.internal.loader.ClassLoaders$AppClassLoader@5bc2b487
 <no principals>
 java.security.Permissions@a33b4e3 (
 ("java.lang.RuntimePermission" "exitVM")
 ("java.io.FilePermission" "/okapi/lib/okapi-application-tikal-1.43.0.jar" "read")
)


3 (file:/okapi/lib/okapi-application-tikal-1.43.0.jar <no signer certificates>)
4 file:/okapi/lib/okapi-application-tikal-1.43.0.jar
5 /okapi/lib/okapi-application-tikal-1.43.0.jar
Merging Moses InlineText
Input: /dataDirOkapi/sourceTikal.xlf
XMLInputFactory: com.ctc.wstx.stax.WstxInputFactory
java.util.EmptyStackException
        at java.base/java.util.Stack.peek(Stack.java:102)
        at java.base/java.util.Stack.pop(Stack.java:84)
        at net.sf.okapi.filters.mosestext.MosesTextFilter.fromPseudoXLIFF(MosesTextFilter.java:337)
        at net.sf.okapi.filters.mosestext.MosesTextFilter.processBuffer(MosesTextFilter.java:269)
        at net.sf.okapi.filters.mosestext.MosesTextFilter.next(MosesTextFilter.java:188)
        at net.sf.okapi.steps.moses.MergingStep.processStartDocument(MergingStep.java:150)
        at net.sf.okapi.steps.moses.MergingStep.handleEvent(MergingStep.java:108)
        at net.sf.okapi.common.pipeline.Pipeline.execute(Pipeline.java:117)
        at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:227)
        at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:199)
        at net.sf.okapi.common.pipelinedriver.PipelineDriver.processBatch(PipelineDriver.java:182)
        at net.sf.okapi.applications.tikal.Main.leverageFileWithMoses(Main.java:1665)
        at net.sf.okapi.applications.tikal.Main.process(Main.java:1043)
        at net.sf.okapi.applications.tikal.Main.main(Main.java:570)

If I rename the close </g>tag it works correctly. Can you help me how can I handle this problem?

Comments (5)

  1. jhargrave-straker

    We have updated tikal to output non-simplified inline codes. Instead of g tags you will bet bpt/ept. Can you test with the latest SNAPSHOT and let us know if this resolves the problem? Thank you!!

  2. László Laki reporter

    Thanks your help. I can confirm this behaviour. It works tags other than g. So as I understand there is no possibility to parse literal string

    &lt;g&gt;a&lt;/g&gt;
    

    ?

    If I escape it with bpt/ept I have recieved the following result:

    From xliff <source><bpt id="1">&lt;g&gt;</bpt>a<ept id="2">&lt;/g&gt;</ept></source> → inline:<g id="1">a</g> → target xliff

    <target><bpt id="1">&lt;g></bpt>a<ept id="2">&lt;/g></ept></target>
    

    In this case you can close this issue.

  3. jhargrave-straker

    The reason the Moses filter fails is that the g tag should have an id attribute. Since the id is missing the regex doesn’t match the opening g tag.

  4. Log in to comment