Excel : tsExcelExcludedColors causes an Exception in some cases

Create issue
Issue #1154 new
Philippe Henri created an issue

Hello

There seems to be a problem with excludedcolors in some cases. See the attached file.

If I define an excluded color :

conditionalParameters.tsExcelExcludedColors.add("FFFF0000");

It works fine with some files (cells with red background are properly excluded).

But with the attached file, I have a crash !!!

Cell A1 causes no problem but A3 causes an exception

I believe the problem is in Cell.java, private boolean excludedFor(final DifferentialFormat format)

for cell A3, I see that format.fill().pattern().foregroundColor().argb() is null ??

therefore I think the exception happens on ...argb().equals(c)

@Denis Konovalenko , you fixed some recent problems i discovered, can you maybe help here also please?

ps: btw, is there an tsExcelincludedcolor as another option? that would be a nice feature?

thanks !
Philippe

Comments (7)

  1. Chase Tingley
    java.lang.NullPointerException
        at net.sf.okapi.filters.openxml.Cell$Default.lambda$excludedFor$3(Cell.java:244)
        at java.base/java.util.stream.MatchOps$1MatchSink.accept(MatchOps.java:90)
        at java.base/java.util.TreeMap$KeySpliterator.tryAdvance(TreeMap.java:2763)
        at java.base/java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:127)
        at java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:502)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:488)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230)
        at java.base/java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.anyMatch(ReferencePipeline.java:528)
        at net.sf.okapi.filters.openxml.Cell$Default.excludedFor(Cell.java:244)
        at net.sf.okapi.filters.openxml.Cell$Default.readWith(Cell.java:165)
        at net.sf.okapi.filters.openxml.Worksheet$Default.readWith(Worksheet.java:103)
        at net.sf.okapi.filters.openxml.ExcelDocument.nextPart(ExcelDocument.java:435)
        at net.sf.okapi.filters.openxml.Document$General.nextPart(Document.java:249)
        at net.sf.okapi.filters.openxml.OpenXMLFilter.nextInDocument(OpenXMLFilter.java:444)
        at net.sf.okapi.filters.openxml.OpenXMLFilter.next(OpenXMLFilter.java:254)
        at net.sf.okapi.filters.openxml.OpenXMLFilter.next(OpenXMLFilter.java:263)
        at net.sf.okapi.steps.common.RawDocumentToFilterEventsStep.handleEvent(RawDocumentToFilterEventsStep.java:166)
    

  2. Chase Tingley

    Bitbucket is acting very flaky, so here’s the FPRM that reproduces this. The bug can be reproduced on the current dev branch (1.44.0-SNAPSHOT):

    #v1
    maxAttributeSize.i=4194304
    bPreferenceTranslateDocProperties.b=true
    bPreferenceTranslateComments.b=true
    bPreferenceTranslatePowerpointNotes.b=true
    bPreferenceTranslatePowerpointMasters.b=true
    bPreferenceIgnorePlaceholdersInPowerpointMasters.b=false
    bPreferenceTranslateWordHeadersFooters.b=true
    bPreferenceTranslateWordHidden.b=false
    bPreferenceTranslateWordExcludeGraphicMetaData.b=false
    bPreferenceTranslatePowerpointHidden.b=false
    bPreferenceTranslateExcelHidden.b=false
    bPreferenceTranslateExcelExcludeColors.b=true
    bPreferenceTranslateExcelSheetNames.b=false
    bPreferenceAddLineSeparatorAsCharacter.b=false
    sPreferenceLineSeparatorReplacement=$0a$
    bPreferenceReplaceNoBreakHyphenTag.b=false
    bPreferenceIgnoreSoftHyphenTag.b=false
    bPreferenceAddTabAsCharacter.b=false
    bPreferenceAggressiveCleanup.b=false
    bPreferenceAutomaticallyAcceptRevisions.b=true
    bPreferencePowerpointIncludedSlideNumbersOnly.b=false
    bPreferenceTranslateExcelDiagramData.b=false
    bPreferenceTranslateExcelDrawings.b=false
    subfilter=
    bInExcludeMode.b=true
    bInExcludeHighlightMode.b=true
    bPreferenceTranslateWordExcludeColors.b=false
    bReorderPowerpointNotesAndComments.b=false
    bPreferenceAllowEmptyTargets.b=false
    bExtractExternalHyperlinks.b=false
    tsComplexFieldDefinitionsToExtract.i=1
    cfd0=HYPERLINK
    tsExcelExcludedColors.i=1
    ccc0=FFFF0000
    tsExcludeWordStyles.i=0
    tsWordHighlightColors.i=0
    tsWordExcludedColors.i=0
    tsPowerpointIncludedSlideNumbers.i=0
    

  3. Philippe Henri reporter

    Hi

    the “fix” below solves the problem but since I am far from being an okapi expert, it may not be the right solution. I still cannot understand why it is ok for cell A1 (which has also a background color, and not for cell A3, for which i even tried to change the BG color, and the problem remain (without the below “fix”)…

        private boolean excludedFor(final DifferentialFormat format) {
    
            if (  format.fill().pattern().foregroundColor().argb() == null ) {
              //  fix 2022-07
               return false;
    
            }
    
            return this.conditionalParameters.tsExcelExcludedColors.stream()
                .anyMatch(c -> format.fill().pattern().foregroundColor().argb().equals(c));
    
        }
    

  4. Denis Konovalyenko

    @Philippe Henri thank you for the finding!

    @Chase Tingley it looks like this issue is not going to have a simple solution. Here is why:

    The referenced fill style and its pattern fill description with foreground colour contains unsupported theme attribute (<attribute name="theme" type="xsd:unsignedInt" use="optional"/>):

          <fill>
            <patternFill patternType="solid">
              <fgColor theme="1"/>
              <bgColor theme="1"/>
            </patternFill>
          </fill>
    

    At the moment, the only supported attribute is rgb (<attribute name="rgb" type="ST_UnsignedIntHex" use="optional"/>):

          <fill>
            <patternFill patternType="solid">
              <fgColor rgb="FF000000"/>
              <bgColor rgb="FF000000"/>
            </patternFill>
          </fill>      
    

    Other available options which can be supported are:

    <attribute name="auto" type="xsd:boolean" use="optional"/>
    <attribute name="indexed" type="xsd:unsignedInt" use="optional"/>
    <attribute name="tint" type="xsd:double" use="optional" default="0.0"/>
    

    What is more, there can exist other style elements that refer to a theme. For instance

        <font>
          <sz val="10"/>
          <color theme="1"/>
          <name val="Noto sans"/>
        </font>
    

    By the way, in this particular case font > color may coexist with fill > patternFill > fgColor style and there is a need to determine which one is used for rendering and thus has to be considered for exclusion.

    So, I assume the theme support has to be provided first, and then any related issues (including this one) would be possible to address when they appear.

  5. Philippe Henri reporter

    Hi @Denis Konovalenko thanks for looking at this problem. For your information, I did some more tests: I copied the buggy cell (A3) → A4:A8 and I changed only the BG colors for each of them, for the last one I have set it to the excluded color (RED). I tested this with my small “fix” and it works. cells A3->A7 have null and ignored, and surprisingly cell A8 is properly excluded !!

    So maybe the “fix” is enough for the purpose of excluding some cells with a BG color?

    file attached: xlstable22.xlsx

  6. Log in to comment