OpenXML: Parse chart content in PPTX and XLSX

Issue #503 resolved
Chase Tingley created an issue

This may apply to XLSX as well, I just don't have a testcase for it yet.

This was reported on the okapitools list here. The testcase is attached.

The text "Chart Explanation not parsed" should be extracted. The string values on the x axis should probably be extracted as well.

Comments (7)

  1. Chase Tingley reporter

    Adding simple_chart.xlsx testcase from issue #374, which was the older version of this.

    PowerPoint and Excel charts use the same format (http://schemas.openxmlformats.org/drawingml/2006/chart:chartSpace), so these cases can be solved together.

  2. Chase Tingley reporter

    I want to copy over Yves's comment from #374 about that sample:

    Assigning a sample.

    Translatable element:

    • title ("This is the chart")

    • series names ("Time", "Value")

    The X-axis label is encoded in a very strange way.

    For reference, the axis labels can be identified with the XPath //c:ser/c:tx/c:strRef/c:strCache/c:pt/c:v/text(). The existing "support" for charts in word documents would work for these PPTX/XLSX charts as well. However, that support excludes the c:v element because not all of its values are translatable. Also, it's using the old OpenXMLContentFilter code. I'd sort of like to write real chart support and switch all 3 over to it, as that would also handle better handling of rich text formatting in chart titles (although that is rare). As a stopgap, we could put the PPTX/XLSX charts on the legacy "word" chart support and it would partially work.

  3. Chase Tingley reporter

    Fix issue #503 - support chart translation for Excel and Powerpoint

    Also refactor charts to use the regular styled text processor and
    remove the legacy code.
    

    → <<cset e550c2caddb1>>

  4. Chase Tingley reporter

    Final note: it turns out the <c:v> content can be safely skipped after all; the axis labels are drawn from the live data.

  5. 金楓

    The text "Chart Explanation not parsed" should be extracted. The string values on the x axis should probably be extracted as well.

    Hi, Chase

    I just tried using "chart_content.pptx".

    "Chart Explanation" is being parsed well. But the x axis is not be extracted with okapi-lib_all-platforms_0.35.

  6. Log in to comment