DOCX/OpenXML: equations(oMathPara, oMath) extracted as text
Original issue 334 created by karlis.ged... on 2013-05-08T07:27:40.000Z:
The equations are extracted as inline text from tikal. During docx translation using tikal the equations are real problem.
Scenario:
1)extract inline text from docx (tikal.sh -lm)
2)translate the inline text (using moses)
3)Create Translated document
As the tags are treated as inline tags and can be reordered, the best case scenario is that the equations is distorted but often a non valid document is created. There is little to no way of telling that this entire segment is not-translatable form inline, there is no valuable information to be extracted from it.
the 'oMath' tag should be treated as a non-translatable object.
I am using 0.21 version on ubuntu.
Comments (5)
-
Account Deleted -
Account Deleted Comment 2. originally posted by karlis.ged... on 2013-05-09T06:33:41.000Z:
To add to this: we have encountered some problems with some other tags with questionable necessity to translate them at all. For me the best would be if it I would be able to pass as parameters the tags to process as non translatable objects.
One other example that I can give right of the bat is that it extracts the author of the document. Which definitely might have some uses but in both translation and term extraction it is extremely unnecessary and a potential problem.
-
Account Deleted Comment 3. originally posted by @ysavourel on 2015-03-07T06:23:28.000Z:
-
-
assigned issue to
- edited description
-
assigned issue to
-
- edited description
- changed status to resolved
This was fixed in the OpenXML rewrite. This file extracts fine now.
- Log in to comment
Comment 1. originally posted by @ysavourel on 2013-05-08T16:52:02.000Z:
This probably needs an option. I have encountered people who definitely do want to translate the textual parts of equation, however perilous it is.