- changed status to open
Problem with un-quoted translatable attributes
Original [issue 126](https://code.google.com/p/okapi/issues/detail?id=126) created by @ysavourel on 2010-03-08T18:29:39.000Z:
Currently if we have this:
<img alt=R&D src=image.png>
the merged translation remains un-quoted, causing possible lose of text if the translation has more than one word:
<img alt=Recherche et Development src=image.png>
Such problem actually occurs relatively frequently with some languages, and is difficutle to detect and fix automatically. It seems the filter needs to produce a skeleton that quotes un-quoted extractable attribute values.
Comments (11)
-
Account Deleted -
Account Deleted Comment [2.](https://code.google.com/p/okapi/issues/detail?id=126#c2) originally posted by @ysavourel on 2010-03-08T19:01:27.000Z:
That's what I meant - fix it when filtering - but fix it in the source so that all targets come out clean rather than trying to detect the problem in each target
But we are not saving the source file to preserve these fixes - unfortunate. There are cases where the customers would appreciate making repairs to their files - this is a good example.
I guess with a custom pipeline we could repair the source then re-save so it is still possible - just not easy.
-
Account Deleted Comment [3.](https://code.google.com/p/okapi/issues/detail?id=126#c3) originally posted by @ysavourel on 2010-03-08T19:20:49.000Z:
But there are many cases where the user cannot do anything between the source and the filter: For example, a filter can be used without any steps (the framework does not forces you to use steps).
Since, technically, there is nothing wrong with <img alt=word src=imag.png/> from the viewpoint of HTML (it follows the specification) our filter should be able to handle it and output a proper form suitable for a translation.
While I agree using unquoted attribute is a bad from the i18n viewpoint (actually using translatable attribute is bad in the first place) and it would be nice to fix the source, I don't think thhe filter should force users to pre-process all files when it can solve the problem relatively easily (hopefully :)
-
Account Deleted Comment [4.](https://code.google.com/p/okapi/issues/detail?id=126#c4) originally posted by @ysavourel on 2010-03-08T19:46:19.000Z:
I think you misunderstood - probably because I was mixing two topics :-) I agree the filter should auto correct for these cases - no preprocessing needed.
But it started me thinking about the cases where we might want to modify the source files - for whatever reason (we actually do this many times in our current workflow, for example to transform a source XML file before filtering). We should be able to have a pipeline that doesn't start with a filter to do exactly this type of operation.
But sense this has nothing to do with the specific bug - never mind :-)
-
Account Deleted Comment [5.](https://code.google.com/p/okapi/issues/detail?id=126#c5) originally posted by @ysavourel on 2010-03-08T19:56:06.000Z:
Oh I see. Thanks for clarifying. BTW: You can start with a Search&Replace step on RawDocument, that'll give you a RawDocument for the next step which could be a RawDocumentToFilterEvents.
-
Account Deleted Comment [6.](https://code.google.com/p/okapi/issues/detail?id=126#c6) originally posted by @ysavourel on 2010-05-26T17:00:14.000Z:
Now I know why I delayed working on this bug :-)
I have a possible solution of rewriting the skeleton - but most of the code is private methods in the EventBuilder. I will need to move significant code to the HtmlEventBuilder. A complicated refactor that is not advised for M7. Its the offsets that Jericho produces and we depend on that is the biggest problem - insert something too early and all the offsets are off.
There is another (easier) option however - if we detect unquoted attributes we trigger a rewrite of the file (not the original). The rewritten file would then have all the attributes quoted in the way we want them and parsed correctly.
Since this happens infrequently and the html parser is fairly fast I think a reparse/rewrite is the solution. But I'm still not sure I can have this done and fully tested by tonight.
What do you think
-
Account Deleted Comment [7.](https://code.google.com/p/okapi/issues/detail?id=126#c7) originally posted by @ysavourel on 2010-05-26T17:49:55.000Z:
I guess, we should postpone the fix to M8: better know the problem than introduce new ones without enough time to catch them.
-
Account Deleted Comment [8.](https://code.google.com/p/okapi/issues/detail?id=126#c8) originally posted by @ysavourel on 2010-05-26T17:58:29.000Z:
How do you feel about the rewrite option (producing a cleaned-up temp file that is then filtered)?
-
Account Deleted Comment [9.](https://code.google.com/p/okapi/issues/detail?id=126#c9) originally posted by @ysavourel on 2010-05-26T19:38:59.000Z:
Mmm... re-writting the whole file to a temporary place, and re-parsing it... all that for a misealy pair of quotes missing? There's got to be a simpler way. Maybe have the writer write teh quotes in addition to the values for attributes? and carry the quote type in a property?
-
Account Deleted Comment [10.](https://code.google.com/p/okapi/issues/detail?id=126#c10) originally posted by @ysavourel on 2010-05-26T21:13:49.000Z:
That's an idea I didn't think about - if we had a property to tell the writer to add the quotes - I can easily detect the attributes that have the problems. Jericho has methods for this.
Intercepting these and updating the skeleton is still a possibility but as I said it would require lots of refactoring. But I will be doing a lot of filter work over the next weeks for the DiffLeverage project anyway - I will keep it in mind.
-
Account Deleted - changed status to resolved
Comment [11.](https://code.google.com/p/okapi/issues/detail?id=126#c11) originally posted by @ysavourel on 2011-03-09T19:19:17.000Z:
We now rewrite any file that has missing quotes - unit tests pass
- Log in to comment
Comment [1.](https://code.google.com/p/okapi/issues/detail?id=126#c1) originally posted by @ysavourel on 2010-03-08T18:37:01.000Z:
Very ugly HTML :-) Would it be better to update the source in this case. Would the customers care that we are fixing this problem? Seems they can't really complain as it will help them in the future if they use tools that are not as smart (i.e., can automatically add a quote)