Problem with un-quoted translatable attributes

Issue #126 resolved
Former user created an issue

Original [issue 126](https://code.google.com/p/okapi/issues/detail?id=126) created by @ysavourel on 2010-03-08T18:29:39.000Z:

Currently if we have this:

<img alt=R&amp;D src=image.png>

the merged translation remains un-quoted, causing possible lose of text if the translation has more than one word:

<img alt=Recherche et Development src=image.png>

Such problem actually occurs relatively frequently with some languages, and is difficutle to detect and fix automatically. It seems the filter needs to produce a skeleton that quotes un-quoted extractable attribute values.

Comments (11)

  1. Former user Account Deleted
    • changed status to open

    Comment [1.](https://code.google.com/p/okapi/issues/detail?id=126#c1) originally posted by @ysavourel on 2010-03-08T18:37:01.000Z:

    Very ugly HTML :-) Would it be better to update the source in this case. Would the customers care that we are fixing this problem? Seems they can't really complain as it will help them in the future if they use tools that are not as smart (i.e., can automatically add a quote)

  2. Former user Account Deleted

    Comment [2.](https://code.google.com/p/okapi/issues/detail?id=126#c2) originally posted by @ysavourel on 2010-03-08T19:01:27.000Z:

    That's what I meant - fix it when filtering - but fix it in the source so that all targets come out clean rather than trying to detect the problem in each target

    But we are not saving the source file to preserve these fixes - unfortunate. There are cases where the customers would appreciate making repairs to their files - this is a good example.

    I guess with a custom pipeline we could repair the source then re-save so it is still possible - just not easy.

  3. Former user Account Deleted

    Comment [3.](https://code.google.com/p/okapi/issues/detail?id=126#c3) originally posted by @ysavourel on 2010-03-08T19:20:49.000Z:

    But there are many cases where the user cannot do anything between the source and the filter: For example, a filter can be used without any steps (the framework does not forces you to use steps).

    Since, technically, there is nothing wrong with <img alt=word src=imag.png/> from the viewpoint of HTML (it follows the specification) our filter should be able to handle it and output a proper form suitable for a translation.

    While I agree using unquoted attribute is a bad from the i18n viewpoint (actually using translatable attribute is bad in the first place) and it would be nice to fix the source, I don't think thhe filter should force users to pre-process all files when it can solve the problem relatively easily (hopefully :)

  4. Former user Account Deleted

    Comment [4.](https://code.google.com/p/okapi/issues/detail?id=126#c4) originally posted by @ysavourel on 2010-03-08T19:46:19.000Z:

    I think you misunderstood - probably because I was mixing two topics :-) I agree the filter should auto correct for these cases - no preprocessing needed.

    But it started me thinking about the cases where we might want to modify the source files - for whatever reason (we actually do this many times in our current workflow, for example to transform a source XML file before filtering). We should be able to have a pipeline that doesn't start with a filter to do exactly this type of operation.

    But sense this has nothing to do with the specific bug - never mind :-)

  5. Former user Account Deleted

    Comment [6.](https://code.google.com/p/okapi/issues/detail?id=126#c6) originally posted by @ysavourel on 2010-05-26T17:00:14.000Z:

    Now I know why I delayed working on this bug :-)

    I have a possible solution of rewriting the skeleton - but most of the code is private methods in the EventBuilder. I will need to move significant code to the HtmlEventBuilder. A complicated refactor that is not advised for M7. Its the offsets that Jericho produces and we depend on that is the biggest problem - insert something too early and all the offsets are off.

    There is another (easier) option however - if we detect unquoted attributes we trigger a rewrite of the file (not the original). The rewritten file would then have all the attributes quoted in the way we want them and parsed correctly.

    Since this happens infrequently and the html parser is fairly fast I think a reparse/rewrite is the solution. But I'm still not sure I can have this done and fully tested by tonight.

    What do you think

  6. Former user Account Deleted

    Comment [9.](https://code.google.com/p/okapi/issues/detail?id=126#c9) originally posted by @ysavourel on 2010-05-26T19:38:59.000Z:

    Mmm... re-writting the whole file to a temporary place, and re-parsing it... all that for a misealy pair of quotes missing? There's got to be a simpler way. Maybe have the writer write teh quotes in addition to the values for attributes? and carry the quote type in a property?

  7. Former user Account Deleted

    Comment [10.](https://code.google.com/p/okapi/issues/detail?id=126#c10) originally posted by @ysavourel on 2010-05-26T21:13:49.000Z:

    That's an idea I didn't think about - if we had a property to tell the writer to add the quotes - I can easily detect the attributes that have the problems. Jericho has methods for this.

    Intercepting these and updating the skeleton is still a possibility but as I said it would require lots of refactoring. But I will be doing a lot of filter work over the next weeks for the DiffLeverage project anyway - I will keep it in mind.

  8. Log in to comment