No text extraction for powerpoint slides when 'squishable' is not set.

Issue #319 resolved
Former user created an issue

Original issue 319 created by aurelien.tomass... on 2013-03-22T10:53:16.000Z:

1.Create a new pptx file
2.use the openXmlFilter to display all the text units

With okapi-lib 0.19, the text units displayed are only those contained into the MasterSlide, not the usable slides.

When investigating, I saw that slides doc typed "slide+xml" are read as Document_Part, and not subdocument. In the source net/sf/okapi/filters/openxml/OpenXMLFilter.java , I think line 649, documents "notesSlide+xml" and "slideMaster+xml" are translated but not "slide+xml".

Comments (7)

  1. Former user Account Deleted

    Comment 1. originally posted by @ysavourel on 2013-03-22T12:00:53.000Z:

    I can't reproduce the problem.
    I've tried with a PPTX file with 'normal slides' and they get extracted.
    The "slide+xml" type for those files seems to be handled in line 640.
    If you could post an example file where the problem occurs it could help.
    Thanks,
    -yves

  2. Former user Account Deleted

    Comment 2. originally posted by aurelien.tomass... on 2013-03-22T13:05:02.000Z:

    I tried with this PPTX found into internet.
    the OpenXmlFilter opens the zip files, and then reads correctly the [Content_Types].xml, and the file /ppt/slideMasters/slideMaster1.xml, but all the files into /ppt/slides are considered as "Document part", and not read.

    PS: i tired with okapi-lib v0.19.
    http://code.google.com/p/okapi/source/browse/okapi/filters/openxml/src/main/java/net/sf/okapi/filters/openxml/OpenXMLFilter.java?name=m19
    Into this file version, i can't see the handler for "slide+xml" type

  3. Former user Account Deleted

    Comment 3. originally posted by @ysavourel on 2013-03-22T13:38:26.000Z:

    PS: i tired with okapi-lib v0.19.
    Into this file version, i can't see the handler for "slide+xml" type

    Line 606 in that file.

    Thanks for the example file. I'll try it.

  4. Former user Account Deleted

    Comment 4. originally posted by @ysavourel on 2013-03-22T13:45:45.000Z:

    Maybe we fixed something since M19, but M21-snapshot seems to be extracting that file properly (see pseudo-translated output).
    I haven't tried with M20 (which is the current release)

  5. Former user Account Deleted

    Comment 5. originally posted by aurelien.tomass... on 2013-03-22T13:48:16.000Z:

    Thanks!
    In fact, if i put the boolean "bSquishable" to true, the line 606 is accessible, but if i turn this boolean to false, then the line 606 is never reached. Then, i don't know if it considered as a bug for this version...
    Thanks for the help

  6. Former user Account Deleted

    Comment 6. originally posted by @ysavourel on 2013-03-22T13:58:27.000Z:

    Mmm.. I'm not sure why an option about optimizing the text runs is tested there.
    That variable seems also set to true evrywhere.
    It looks like there is something fishy about this part of the code.
    I'll keep the issue open for now.
    Thanks for the input/feedback.
    -ys

  7. Former user Account Deleted

    Comment 7. originally posted by @ysavourel on 2013-07-24T19:24:23.000Z:

    I changed the bSquishable test.

  8. Log in to comment