DOCX/OpenXML: equations(oMathPara, oMath) extracted as text

Issue #334 resolved
Former user created an issue

Original issue 334 created by karlis.ged... on 2013-05-08T07:27:40.000Z:

The equations are extracted as inline text from tikal. During docx translation using tikal the equations are real problem.

Scenario:
1)extract inline text from docx (tikal.sh -lm)
2)translate the inline text (using moses)
3)Create Translated document
As the tags are treated as inline tags and can be reordered, the best case scenario is that the equations is distorted but often a non valid document is created. There is little to no way of telling that this entire segment is not-translatable form inline, there is no valuable information to be extracted from it.
the 'oMath' tag should be treated as a non-translatable object.

I am using 0.21 version on ubuntu.

Comments (5)

  1. Former user Account Deleted

    Comment 1. originally posted by @ysavourel on 2013-05-08T16:52:02.000Z:

    This probably needs an option. I have encountered people who definitely do want to translate the textual parts of equation, however perilous it is.

  2. Former user Account Deleted

    Comment 2. originally posted by karlis.ged... on 2013-05-09T06:33:41.000Z:

    To add to this: we have encountered some problems with some other tags with questionable necessity to translate them at all. For me the best would be if it I would be able to pass as parameters the tags to process as non translatable objects.

    One other example that I can give right of the bat is that it extracts the author of the document. Which definitely might have some uses but in both translation and term extraction it is extremely unnecessary and a potential problem.

  3. Log in to comment