First character of relative target path dropped

Issue #299 resolved
Former user created an issue

Original issue 299 created by @ysavourel on 2012-12-19T04:06:15.000Z:

On some occasion the first character of the target relative path of Translation Toolkit created by Rainbow are dropped.

for example:

<doc xml:space="preserve" docId="1" extractionType="xliff" relativeInputPath="catalog/_result_template_en.json" filterId="okf_json" inputEncoding="UTF-8" relativeTargetPath="atalog/_result_template_en.json" targetEncoding="UTF-8" selected="1">...</doc>

This is probably related to ending '/' with root.

Comments (12)

  1. Former user Account Deleted

    Comment 3. originally posted by @fliden on 2013-02-26T23:41:03.000Z:

    I've on confirmed on Windows that if the relativeInputPath ends with a slash then the relativeTargetPath is one character short. Assuming it's the same for Unix.

    By default in Rainbow when you drag files in the ending \ is not added. I guess we can just strip out the trailing slash from the relativeInputPath when updated.

  2. Former user Account Deleted
    • changed status to open

    Comment 5. originally posted by @amake on 2013-02-28T03:38:28.000Z:

    I stumbled upon this recently myself, and there's a bit more to the problem.

    The truncated paths are generated in rainbowkit.creation.ExtractionStep.handleRawDocument() where this.inputRootDir is chopped out of inputUri.getPath() and outputURI.getPath().

    There are at least two issues:
    1. The inputRootDir may or may not have a trailing slash (this is now fixed).
    2. The inputRootDir may or may not have a *leading* slash. The .getPath()-generated strings we're comparing against *will* have a leading slash even on Windows, but a real-world inputRootDir generally will not. To muddle things further, the only(?) relevant unit test (ExtractionStepTest) uses .getPath() to generate the inputRootDir, so this problem was invisible.

    The issue was partially solved in a commit by Stefan Pries:
    https://code.google.com/p/okapi/source/diff?spec=svnb78448c12f12008a1eb2b29666b38b3dd0332199&name=dev&r=b78448c12f12008a1eb2b29666b38b3dd0332199&format=side&path=/okapi/steps/rainbowkit/src/main/java/net/sf/okapi/steps/rainbowkit/creation/ExtractionStep.java

    However that only works for "real-world" Windows where comment 1. and comment 2. above require a +2 to the inputRootDir.length(). Other platforms needed +1 or +0 depending on comment 1., but now only +1 is required.

    So now for Windows we still need to normalize the leading slash.

    Does that jive with everyone's understanding of the issue? Please see here for my proposed fix:
    https://code.google.com/r/aaron-okapi/source/detail?r=35f1036cd03ff3ee67b2f2b7af576620b850e925&name=aaron/manifest-trunc-fix-squashed

  3. Former user Account Deleted

    Comment 6. originally posted by KFLi... on 2013-02-28T20:33:56.000Z:

    Aha, thanks for looking at that Aaron. Your change would fix 2. but I'll raise you one. :) issue #3. \\crux\common\folder if you drop files from a network share and there might be other cases that we haven't thought about. What do you think of this solution instead? Do you see any issues? Otherwise maybe we should create a more robust helper method for getting the relative path.

    @ Override
    protected Event handleStartDocument (Event event) {
    StartDocument sd = event.getStartDocument();
    String relativeInput = new File(inputRootDir).toURI().relativize(inputURI).getPath();
    String relativeOutput = new File(outputRootDir).toURI().relativize(outputURI).getPath();

  4. Former user Account Deleted

    Comment 7. originally posted by @amake on 2013-03-04T08:24:14.000Z:

    Hi Fredrik. Sorry I missed your comment; I forgot to sign up for CC.

    Thanks for the far-superior solution. I think it's much better to use URI.relativize() than to come up with our own fiddly version.

    I tested your fix on Windows and OS X and it looks good to me. Since I've also got a tweak to the relevant unit test, shall I go ahead and commit it?

  5. Former user Account Deleted

    Comment 8. originally posted by KFLi... on 2013-03-04T17:10:32.000Z:

    Hi Aaron, thanks for checking on os X. Yeah it would be great if you can commit it with the unit test, thanks!

  6. Former user Account Deleted

    Comment 9. originally posted by @amake on 2013-03-05T02:27:40.000Z:

    Thanks Fredrik. I've pushed the patch.

  7. Former user Account Deleted

    Comment 10. originally posted by Alexander.Buchholtz.Ont... on 2013-07-01T18:15:18.000Z:

    Hi Aaron,

    the fix results in a regression bug in Rainbow and Longhorn when using a custom output folder.
    Consider setting the output folder in Rainbow under "Other Settings" tab to a different location than the input root folder. Longhorn by default uses different folders for input and output.

    With Okapi M21 the relativeTargetPath in the resulting manifest file will be the absolute output path lead by a slash. E.g.:
    * input root path "C:\directory"
    * output root path "C:\another_directory"
    leads to "/C:\another_directory\file_name.html" as a relativeTargetPath which is obviously invalid.

    The issue comes to the surface when executing a pipeline containing the Rainbow Kit Merging Step where the output file can't be opened and a FileNotFoundException is thrown.
    java.io.FileNotFoundException: C:\Users\...\Okapi-Longhorn-Files\9\output\C:\Users\...\Okapi-Longhorn-Files\3\output\small_file.html (Die Syntax für den Dateinamen, Verzeichnisnamen oder die Datenträgerbezeichnung ist falsch)
    at net.sf.okapi.common.filterwriter.GenericFilterWriter.createWriter(GenericFilterWriter.java:372)

    String relativeOutput = new File(outputRootDir).toURI().relativize(outputURI).getPath();

    The above line only works if the first portion of outputURI matches the outputRootDir which is not the case if you have a custom output root dir.

    Do you have any idea how to work around this issue until there is a fix?

    Thanks,
    Alexander

  8. Former user Account Deleted

    Comment 11. originally posted by @amake on 2013-07-02T04:54:15.000Z:

    Hi Alexander. Thanks for the report.

    It looks like this has *never* been handled correctly. Since the Java port, calculation of the relative URI was a completely-broken, simple substring chop. Now we have the Java URI#relativize(), which at least is guaranteed to result in a valid URI, but does not handle a custom output root dir correctly.

    According to my understanding:

    1. You are correct that it is wrong to have an absolute directory as "relativeTargetPath" in the manifest.

    2. The use of URI#relativize() is actually fine; the real problem is that the custom output dir is not propagated to the ExtractionStep, so relativization is performed against the wrong dir.

    I have prepared a patch to add the necessary scaffolding to propagate the output dir to ExtractionStep; that will fix this side of the issue.

    1. It appears that there is no mechanism to store and propagate a custom output dir set at kit creation time. The output root dir is not stored in the manifest, so proper functionality here relies on the output root dir being properly set at post-processing/merge time. For Rainbow that means you can't simply create a new config to process your manifest (if you want your custom output dir respected). I'm less familiar with Longhorn, so maybe it's not a problem there.
  9. Former user Account Deleted

    Comment 12. originally posted by @amake on 2013-07-02T06:16:12.000Z:

    The patch has been committed, and should be available in a snapshot shortly.

    Please let me know if this fixes the issue for Longhorn.

  10. Former user Account Deleted

    Comment 13. originally posted by Alexander.Buchholtz.Ont... on 2013-07-03T07:45:29.000Z:

    Hi Aaron,

    your fix seems to do the trick, I was able to do the round-trip with Longhorn successfully. I really appreciate your fast help!

    Thanks,
    Alexander

  11. Log in to comment