Ratel/Rainbow: Language maps not taken into account

Issue #431 resolved
Former user created an issue

Original issue 431 created by m...@sebastianebert.com on 2015-01-13T07:53:47.000Z:

What steps will reproduce the problem?
1. Create a new project in Rainbow. Source language e.g. de-de (or DE-DE), target language e.g. EN-GB (or en-gb)
2. Go to "translation kit creation" and select the segmentation step
3. Define a language map in Ratel using the regexp "DE.*"

What is the expected output? What do you see instead?
I would expect Ratel to segment my source file, but it doesn't. However it works when I use the regexp "de.*" (small letters). It does not have any influence wheter I select "DE-DE" or "de-de" in Rainbow. Seems to be a problem with the case sensitiveness.

What version of the product are you using? On what operating system?
Ratel: 0.23
Rainbow: 0.23
Windows 7

Please provide any additional information below.

Comments (5)

  1. Former user Account Deleted
    • changed status to open

    Comment 1. originally posted by @ysavourel on 2015-01-13T11:53:00.000Z:

    The language code you specify in Rainbow can be set to upper or lower cases in Rainbow's UI, but when used it is normalized as lowercases. So when you specify "DE-DE" it's actually using "de-de" during the process.

    We need to update the documentation for this.

  2. Former user Account Deleted

    Comment 2. originally posted by m...@sebastianebert.com on 2015-01-13T12:57:09.000Z:

    I had a similar issue "UTF-8" and "utf-8". Could you e.g. make the two input field converting erverything to lower case so that the user can see this behavior on the frontend? If it's just put on the documentation it might be overseen.

  3. Former user Account Deleted

    Comment 3. originally posted by @ysavourel on 2015-01-20T12:07:47.000Z:

    Documentation has been updated.

  4. Former user Account Deleted

    Comment 4. originally posted by m...@sebastianebert.com on 2015-01-21T07:38:18.000Z:

    OK, does this mean that one can either enter capital or lower letters on rainbow in the future to avoid this problem?

  5. Former user Account Deleted

    Comment 5. originally posted by @ysavourel on 2015-01-21T12:09:15.000Z:

    No, it means the documentation now includes a note pointing out the potential issue.

    Ratel is not used to edit SRX rules that are only used with other Okapi tools. Other tools may or may not normalize their language codes when using SRX, so we cannot assume one case or the other.

    The solution is to write the regular expression in a way that it is not case-sensitive so it always works. For example use '[Ee][Nn].*' instead of 'en.*' or 'EN.*'.

    See for example the map at the end the sample SRX that comes with the specification: http://www.gala-global.org/oscarStandards/srx/srx20.html#AppSample

  6. Log in to comment