WorkbookProtection is too strict

Issue #740 resolved
Jeremy Braun
created an issue

openpyxl expects the workbookHashValue and workbookSaltValues to be HexBinary strings. However, excel can produce base64 encoded strings

This documentation from Microsoft: https://msdn.microsoft.com/en-us/library/documentformat.openxml.spreadsheet.workbookprotection(v=office.15).aspx

Calls out for both of these fields: The possible values for this attribute are defined by the W3C XML Schema base64Binary datatype.

Here's another implementation (Java) that is consistent with the MSFT docs above. http://www.javafind.net/gate.jsp?q=/library/162/docx4j-3.1.0-javadoc/org/xlsx4j/sml/CTWorkbookProtection.html

Note that revisionsHashValue and revisionsSaltValue are also supposed to be base64 data

Currently, worksheets protected in this way fail to parse.

Comments (9)

  1. CharlieC

    I've looked at this in greater detail but unfortunately I cannot create a test file myself as Excel for Mac does not implement the SHA-512 that Excel for Windows >= 2013 does. I just need the workbook.xml file from the archive.

  2. CharlieC

    Thanks, be interesting to know how to create that using Python seeing as how poorly documented this is. The default mechanism always returns hex which is why we never picked up on this before. Strong encryption for this, and sheet protection is a bit of a nonsense anyway as they only refer to stuff the GUI. Anyone with a text editor can just remove the elements in the XML and protection is removed.

  3. Jeremy Braun reporter

    Some digging around gave this: https://github.com/plutext/docx4j/blob/master/src/xlsx4j/java/org/docx4j/openpackaging/packages/ProtectWorkbook.java

    Which calls:

    import org.docx4j.org.apache.poi.poifs.crypt.CryptoFunctions;
    byte hash[] = CryptoFunctions.hashPassword(password, hashAlgo, salt, spinCount, false);
    

    I found an implementation here: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.poi/poi/3.12/org/apache/poi/poifs/crypt/CryptoFunctions.java

    And a quick implementation in Python with the example I gave you produces the correct hash (the example password was 'password')

    import hashlib
    from base64 import b64encode, b64decode
    import struct
    
    workbookAlgorithmName="SHA-512"
    workbookHashValue="wDZaZrfM8uKpKghbfws7rY7pmVoOwHjy5qg5d2ABHdSMtH1y0IIkgwJT5Hl2lacSw1sNusImGBUQs/sHcql3hw=="
    workbookSaltValue="ah1OevWahpb3tQiJO3qrnQ=="
    workbookSpinCount="100000"
    
    alg = workbookAlgorithmName.replace('-', '').lower()
    if alg not in hashlib.algorithms:
        raise Exception("Algorithm %s not in %r" % (alg, hashlib.algorithms))
    alg = getattr(hashlib, alg)
    count = int(workbookSpinCount)
    
    digests = []
    
    salt = workbookSaltValue
    salt = b64decode(salt)
    
    password = 'password'
    
    m = alg()
    m.update(salt) # data is binary, length = 16
    m.update(password.encode('utf-16-le')) # window's "unicode" encoding
    h = m.digest()
    
    for i in xrange(count):
        m = alg()
        m.update(h)
        m.update(struct.pack("<I", i)) # little endian 4-byte unsigned integer iterator
        h = m.digest()
    
    digest = b64encode(h)
    print "CALCULATED:", digest, "OKAY" if digest == workbookHashValue else "----"
    print "EXPECTED:  ", workbookHashValue
    
    python test.py
    CALCULATED: wDZaZrfM8uKpKghbfws7rY7pmVoOwHjy5qg5d2ABHdSMtH1y0IIkgwJT5Hl2lacSw1sNusImGBUQs/sHcql3hw== OKAY
    EXPECTED:   wDZaZrfM8uKpKghbfws7rY7pmVoOwHjy5qg5d2ABHdSMtH1y0IIkgwJT5Hl2lacSw1sNusImGBUQs/sHcql3hw==
    
  4. Jeremy Braun reporter

    What's not clear is whether the "utf-16-le" and little-endian iterator are ALWAYS the right choices, or if they depend on things like the encoding of the workbook file, or some other attribute somewhere. I'd hazard that ECMA-376 or some other Office XML spec actually specifies what those are supposed to be in all cases.

  5. CharlieC

    Thanks very much for the digging. I don't think we'll ever implement the optional encryption strategies and stick with just preserving what's there but we might add this to the docs. More importantly, I guess is a section on WorkProtection pointing out that this has nothing to do with security.

    Can you confirm that you can now open files with a checkout of the 2.4 branch?

  6. Log in to comment