OpenXML: DOCX style names changing during translation

Issue #1006 new
Chunyu Yan created an issue

I extracted xliff from file “sample.docx”, then merged xliff to make output file “style_sample_out.docx”. so I found that the file “style_sample.out.docx” is not in the same style as the source file “sample.docx”.

I don't know if it's the wrong way to use it, Any way to keep the same style name?

Comments (10)

  1. Chase Tingley

    Interesting. @Denis Konovalyenko it looks like the style names might be getting corrupted during translation. What do you think?

  2. Denis Konovalyenko

    @Chunyu Yan , @Chase Tingley , the process of styles optimisation forms new paragraph styles, based on the available ones, and applies them to appropriate paragraphs in the document. The existing styles remain intact. So, I assume this should be the viewer/editor application responsibility to make a decision on what styles to show in the list.

    Below you may find more technical details on that.

    1. The document parts differences:

    2. Styles parts differences

  3. Chunyu Yan reporter

    @Denis Konovalyenko Thank you for your reply. I've just taken over a tough project, but I don't know much about document processing. So I still don't know how to deal with it.

    Background: our client that defines many styles of documents, and all their documents need to be delivered in fixed styles. How can I decide what style show in the docx list?

  4. Denis Konovalyenko

    @Chunyu Yan , the styles optimisation process, which takes place when an OpenXML document is extracted for translation, is finding the best-matched style to base on it the creation of a new one (only the difference is added). In this case, the name of a new style is composed of a generated prefix (for instance, “P64B34001-”) and the parent style name appended (for instance, “my-nice-paragraph -style”). If it fails to find the best-matched styles from the available ones, then a new style is created in full and named by the generated prefix and sequential number (e.g. “P64B34001-1”).

    This is how it works and now I would like to say a bit on why it is done so. In short, the styles optimisation improves the segmentation quality. Thus, it allows having a better chance in extracting say “A test paragraph” instead of “<g>A</g><g> test</g><g> paragraph</g>” (if it is XLIFF).

    I am afraid, the preservation of style names would hardly ever be possible. The only feasible case might be when a custom style is “abandoned” or not used at all - its name can be applied for a newly generated style then. However, this will “preserve” the name only, not the formatting information. Therefore, it does not even worth the effort in doing so.

    As a trade-off, the styles optimisation may not be performed under a certain condition (a filter parameter), so you can find the original styles as they were after processing. Could you please let me know if this is the route you are for?

    Thank you.

  5. FJ

    @Denis Konovalyenko

    the styles optimisation may not be performed under a certain condition (a filter parameter)

    I would appreciate it if you could tell me how to set the parameters to prevent optimization from being performed.

  6. Denis Konovalyenko

    @FJ , the optional styles optimisation was proposed as a possible trade-off. And this has not been implemented.

  7. Log in to comment