YAML Filter fails to parse keys using Explicit Key syntax (?key:value)

Create issue
Issue #1021 open
Vítor Antero created an issue


As mentioned above, the YAML filter isn’t able to parse files with complex mapping keys:

net.sf.okapi.common.exceptions.OkapiBadFilterInputException: Error parsing YAML file: Lexical error at line 8948, column 8.  Encountered: " " (32), after : "?"

I’ve attached two files:

  • complex_mapping_keys.yml contains real-world examples that I’ve encountered on files that need to be translated
  • complex_mapping_list_keys.yml contains the example 2.11 extracted from YAML official specs (https://yaml.org/spec/current.html), just for the sake of completeness

Thank you,

NOTE: There are two issues: (1) parsing the ?key:value syntax, (2) handling a key that is a map or list. This issue will track the first so that complex_mapping_keys.yml can be handled. I (@bhlkuro) will create another issue to track (2) for completeness because (2) seems less important practically for the localization field. Handling complex_mapping_list_keys.yml is deferred to the implementation of (2).

Comments (10)

  1. Jim Hargrave (OLD)
    • changed status to open

    This is a long standing weakness of our yaml parser. We need to add support for complex keys

  2. Jim Hargrave (OLD)

    I have changed the filters using javacc so that they javacc code is generated by the maven build. This will make it easier to address this issue Unfortunately this is going to require some significant changes to the yaml parser and this is probably an infrequent use case.

  3. Jim Hargrave

    That commit was a general cleanup in preparation for more changes. I actually attempted to make the fix but it ended up more complicated than I expected and I had to revert those changes. There is still a chance I can address this before the fall 1.42.0 release as I am now currently focusing on all current filter failures. YAML is just a bit lower on the priority list :-(

  4. Kuro Kurosaka (BH Lab)

    I came up with this code:


    This can parse complex_mapping_keys.yml and produce the .xlf file. The generated .xlf file misses a layer. Instead of:

    <group id="sg6">
    <trans-unit id="tu10" resname="en/Id::Header/id_header" xml:space="preserve">

    it generates:

     <trans-unit id="tu10" resname="en/id_header" xml:space="preserve">

    Notice “/Id::Header” is missing. Tracing the way the parser parsed the Yaml file, it is clear that the line

      ? "Id::Header"

    was handled as a key-value pair with null value and immediately returned. As a result, the next key-value pairs were handled as though a sibling of “Id::Header” key-value pair.

    This was probably because the parsing rules don’t consider the case where “:” appears across the lines. An attempt was made to change the rules to include this case. But because of the hundred lines of code in the token manager section of JavaCC code that creates artificial tokens at very specific conditions, I could not come up to the working code in a reasonable time.

    @Jim Hargrave (OLD) suggested that we may need to refactor the parser code before trying to handle the explicit key syntax.

    I explored another idea of using existing Yaml parser such as Snakeyaml Engine. It would almost work except that Snakeyaml does not generate a token for the whitespaces between other tokens, and therefore it does not work for our purpose where re-generating the original Yaml code is essential.

  5. Log in to comment