XLIFF 2.0: Print segment's source and their own <ph> tags in terminal

Issue #1023 closed
Handika D created an issue
<unit id="45">
  <segment id="45-1">
    <source xml:lang="en" xml:space="preserve">I need more information <ph dataRef="ph1" id="ph1"/>www.myapp.com<ph dataRef="ph2" id="ph2"/> .</source>
  </segment>
  <segment id="45-2">
    <source xml:lang="en" xml:space="preserve">Need more <ph dataRef="ph1" id="ph1"/>information<ph dataRef="ph2" id="ph2"/>?</source>
  </segment>
</unit>

My java code:

import net.sf.okapi.lib.xliff2.core.*;
import net.sf.okapi.lib.xliff2.document.XLIFFDocument;

File xliffFile = new File(xliffFileName);

XLIFFDocument doc = new XLIFFDocument();
// Load the document
doc.load(xliffFile);
Iterable<Unit> unitIterable = doc.getUnits();

for (Unit unit: unitIterable) {
    Iterable<Segment> segmentIterable = unit.getSegments();

    for (Segment segment: segmentIterable) {
        String source = segment.getSource().getCodedText();
        System.out.println(source);
    }
}

It prints me weird printable characters in terminal, I’m pretty sure they are unicodes:

I need more information www.myapp.com .
Need more information?

What is the workaround to print those plain texts with their own <ph>tags in terminal?

Comments (14)

  1. Handika D reporter

    Thank you for answering.
    would you elaborate what “a Renderer” means?
    any code snippet of it?

  2. ysavourel

    It’s a class that implements the IFragmentRenderer interface. That interface allows you to “render” (output) the content of the fragment. There is a default one for XLIFF provided: XLIFFFragmentRenderer.

    See my edited post above for a link to an example in the test units.

  3. ysavourel

    Yes, 1.41.0 is the latest release. That one is the core. Then there are artifacts for each filter, steps, etc. You can just use what you need.

  4. Handika D reporter

    tried this:

    for (Unit unit: unitIterable) {
        Iterable<Segment> segmentIterable = unit.getSegments();
    
        for (Segment segment: segmentIterable) {
            // System.out.println(segment.getSource());
            String source = segment.getSource().getCodedText();
            TextFragment tf = new TextFragment(source);
            System.out.println(tf.toText());
        }
    }
    

    those phtags are still not rendered as string

  5. ysavourel

    You are mixing two separate libraries. You Unit object is an XLIFF2 library object and TextFragment is a main Okapi library object. My fault to get you confused with my initial mention of TextFragment.

    Since your unit is XLIFF2 you can use the XLIFF2 library. One can also convert XLIFF2 objects to the main Okapi library objects, but it’s not easy because the two object models are very different.
    The easiest way to get XLIFF2 content into the main Okapi library object model is to use the XLIFF2Filter that reads XLIFF2 files using the main Okapi library model.

    If you work with the XLIFF2 library, you can use renderers to output whatever you want for the inline codes. (e.g. to export to TMX, etc.). But, to make things simpler, we also have defaults for XLIFF2 itself and you can just do this:

        Iterable<Segment> segmentIterable = unit.getSegments();
        for (Segment segment: segmentIterable) {
            System.out.println(segment.getSource().toXLIFF());
        }
    

    BTW, you may get faster/better answers from the developers group: https://groups.google.com/g/okapi-devel

  6. Log in to comment