XLIFF 2.0: Print segment's source and their own <ph> tags in terminal
<unit id="45">
<segment id="45-1">
<source xml:lang="en" xml:space="preserve">I need more information <ph dataRef="ph1" id="ph1"/>www.myapp.com<ph dataRef="ph2" id="ph2"/> .</source>
</segment>
<segment id="45-2">
<source xml:lang="en" xml:space="preserve">Need more <ph dataRef="ph1" id="ph1"/>information<ph dataRef="ph2" id="ph2"/>?</source>
</segment>
</unit>
My java code:
import net.sf.okapi.lib.xliff2.core.*;
import net.sf.okapi.lib.xliff2.document.XLIFFDocument;
File xliffFile = new File(xliffFileName);
XLIFFDocument doc = new XLIFFDocument();
// Load the document
doc.load(xliffFile);
Iterable<Unit> unitIterable = doc.getUnits();
for (Unit unit: unitIterable) {
Iterable<Segment> segmentIterable = unit.getSegments();
for (Segment segment: segmentIterable) {
String source = segment.getSource().getCodedText();
System.out.println(source);
}
}
It prints me weird printable characters in terminal, I’m pretty sure they are unicodes:
I need more information www.myapp.com .
Need more information?
What is the workaround to print those plain texts with their own <ph>
tags in terminal?
Comments (14)
-
reporter -
reporter - edited description
-
You are outputting “coded text” where each inline code is represented by 2 special characters. Use .toText() to get the original codes.
You probably want to read the developer’s guide at https://okapiframework.org/devguide/gettingstarted.html#textUnits for more information on the TextFragment class and other Okapi low-level classes.
-
Actually my answer above was for Okapi’s main library. Not XLIFF2. While the part about coded-text is still true, the way to output is a bit different.
For the XLIFF2 library, you can use a Renderer to create different types of output of the inline codes (XLIFF, TMX, original, etc.). See the guide in https://bitbucket.org/okapiframework/xliff-toolkit/wiki/Home .
See for example the XLIFFFragmentRenderer class. You can see it used in the unit tests.
-
- changed status to closed
-
reporter Thank you for answering.
would you elaborate what “a Renderer” means?
any code snippet of it? -
reporter and is the okapi’s main library available on maven repo?
-
It’s a class that implements the IFragmentRenderer interface. That interface allows you to “render” (output) the content of the fragment. There is a default one for XLIFF provided: XLIFFFragmentRenderer.
See my edited post above for a link to an example in the test units.
-
Yes, the main library is in the main Maven repo: https://mvnrepository.com/artifact/net.sf.okapi
-
reporter -
Yes, 1.41.0 is the latest release. That one is the core. Then there are artifacts for each filter, steps, etc. You can just use what you need.
-
reporter tried this:
for (Unit unit: unitIterable) { Iterable<Segment> segmentIterable = unit.getSegments(); for (Segment segment: segmentIterable) { // System.out.println(segment.getSource()); String source = segment.getSource().getCodedText(); TextFragment tf = new TextFragment(source); System.out.println(tf.toText()); } }
those
ph
tags are still not rendered as string -
You are mixing two separate libraries. You Unit object is an XLIFF2 library object and TextFragment is a main Okapi library object. My fault to get you confused with my initial mention of TextFragment.
Since your unit is XLIFF2 you can use the XLIFF2 library. One can also convert XLIFF2 objects to the main Okapi library objects, but it’s not easy because the two object models are very different.
The easiest way to get XLIFF2 content into the main Okapi library object model is to use the XLIFF2Filter that reads XLIFF2 files using the main Okapi library model.If you work with the XLIFF2 library, you can use renderers to output whatever you want for the inline codes. (e.g. to export to TMX, etc.). But, to make things simpler, we also have defaults for XLIFF2 itself and you can just do this:
Iterable<Segment> segmentIterable = unit.getSegments(); for (Segment segment: segmentIterable) { System.out.println(segment.getSource().toXLIFF()); }
BTW, you may get faster/better answers from the developers group: https://groups.google.com/g/okapi-devel
-
reporter That solves my problem. Thank you so much
didn’t know there is an okapi group. - Log in to comment