The mapping from file extension to configId is "fragile"

Issue #913 new
Mihai Nita created an issue

Trying tikal without specifying `-fc` there are many extensions that are not recognized (.dita, .csv, .yaml, etc)
Same for Rainbow (but the missing extensions are different than the tikal ones).

Comments (4)

  1. Mihai Nita reporter

    Digging a bit there are several causes:

    • Hard-coded mappings in both Tikal (net.sf.okapi.applications.tikal.Main) and Rainbow (net.sf.okapi.applications.rainbow.lib.FormatManager)
    • Missing filter dependencies in the pom.xml files (for both Tikal and Rainbow)
    • Filters with that don’t declare any extensions (for example okf_odf, okf_vignette, okf_xml-AndroidStrings, okf_transtable, okf_table_fwc, okf_table_tsv, about 30 of them)
    • Filters that have extensions, but incorrectly (missing ; at the end of the extension list) (examples: yaml, xliff, tex, icml)
    • Filters that don’t declare all the extensions they support (for example tikal hard-codes .ent to okf_dtd, but the dtd filter does not declare .ent. Similar for archive (.zip), some Open Office extensions (swc, swx, sxd, sxi), etc.

    We also have differences not only between Tikal / Rainbow, but also from these applications and what FilterConfigurationMapper.getDefaultConfigurationFromExtension returns. For example the tools map .html => okf_html, .txt => okf_plaintext, .xlf => okf_xliff, .xliff => okf_xliff, the API maps .html => okf_itshtml5, .txt => okf_mosestext, .xlf => okf_xliff2, .xliff => okf_autoxliff

    I think we should fix the API, then update Tikal and Rainbow to use the API.

  2. Chase Tingley

    FWIW, Longhorn also uses its own mechanism, where a bconf includes a text file that maps extensions to filter config names.

  3. Mihai Nita reporter

    I’m kind of reluctant to fix this before the 1.39 release (soon?), because changing the behavior of getDefaultConfigurationFromExtension so late in the game might have surprising side-effects.

    @Chase: you think it would be good to make that behavior configurable with a file? Or “a decent default” would be good enough (and applications that want better control, like Longohorn, can do their own thing?)

  4. Chase Tingley

    I think a sensible default should be good enough for now. I don’t think the file adds much as long as we can configure the mappings programmatically.

  5. Log in to comment