WorkbookProtection is too strict
openpyxl expects the workbookHashValue and workbookSaltValues to be HexBinary strings. However, excel can produce base64 encoded strings
This documentation from Microsoft: https://msdn.microsoft.com/en-us/library/documentformat.openxml.spreadsheet.workbookprotection(v=office.15).aspx
Calls out for both of these fields: The possible values for this attribute are defined by the W3C XML Schema base64Binary datatype.
Here's another implementation (Java) that is consistent with the MSFT docs above. http://www.javafind.net/gate.jsp?q=/library/162/docx4j-3.1.0-javadoc/org/xlsx4j/sml/CTWorkbookProtection.html
Note that revisionsHashValue and revisionsSaltValue are also supposed to be base64 data
Currently, worksheets protected in this way fail to parse.
Comments (9)
-
-
I've looked at this in greater detail but unfortunately I cannot create a test file myself as Excel for Mac does not implement the SHA-512 that Excel for Windows >= 2013 does. I just need the workbook.xml file from the archive.
-
- attached workbook.xml
Here's an example from Excel 2016, I think. The hash algorithm is SHA-512.
-
Thanks, be interesting to know how to create that using Python seeing as how poorly documented this is. The default mechanism always returns hex which is why we never picked up on this before. Strong encryption for this, and sheet protection is a bit of a nonsense anyway as they only refer to stuff the GUI. Anyone with a text editor can just remove the elements in the XML and protection is removed.
-
- changed status to resolved
Hashes should be stored as Base64 Resolves
#740→ <<cset b5fb27d03180>>
-
Some digging around gave this: https://github.com/plutext/docx4j/blob/master/src/xlsx4j/java/org/docx4j/openpackaging/packages/ProtectWorkbook.java
Which calls:
import org.docx4j.org.apache.poi.poifs.crypt.CryptoFunctions; byte hash[] = CryptoFunctions.hashPassword(password, hashAlgo, salt, spinCount, false);
I found an implementation here: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.poi/poi/3.12/org/apache/poi/poifs/crypt/CryptoFunctions.java
And a quick implementation in Python with the example I gave you produces the correct hash (the example password was 'password')
import hashlib from base64 import b64encode, b64decode import struct workbookAlgorithmName="SHA-512" workbookHashValue="wDZaZrfM8uKpKghbfws7rY7pmVoOwHjy5qg5d2ABHdSMtH1y0IIkgwJT5Hl2lacSw1sNusImGBUQs/sHcql3hw==" workbookSaltValue="ah1OevWahpb3tQiJO3qrnQ==" workbookSpinCount="100000" alg = workbookAlgorithmName.replace('-', '').lower() if alg not in hashlib.algorithms: raise Exception("Algorithm %s not in %r" % (alg, hashlib.algorithms)) alg = getattr(hashlib, alg) count = int(workbookSpinCount) digests = [] salt = workbookSaltValue salt = b64decode(salt) password = 'password' m = alg() m.update(salt) # data is binary, length = 16 m.update(password.encode('utf-16-le')) # window's "unicode" encoding h = m.digest() for i in xrange(count): m = alg() m.update(h) m.update(struct.pack("<I", i)) # little endian 4-byte unsigned integer iterator h = m.digest() digest = b64encode(h) print "CALCULATED:", digest, "OKAY" if digest == workbookHashValue else "----" print "EXPECTED: ", workbookHashValue
python test.py CALCULATED: wDZaZrfM8uKpKghbfws7rY7pmVoOwHjy5qg5d2ABHdSMtH1y0IIkgwJT5Hl2lacSw1sNusImGBUQs/sHcql3hw== OKAY EXPECTED: wDZaZrfM8uKpKghbfws7rY7pmVoOwHjy5qg5d2ABHdSMtH1y0IIkgwJT5Hl2lacSw1sNusImGBUQs/sHcql3hw==
-
What's not clear is whether the "utf-16-le" and little-endian iterator are ALWAYS the right choices, or if they depend on things like the encoding of the workbook file, or some other attribute somewhere. I'd hazard that ECMA-376 or some other Office XML spec actually specifies what those are supposed to be in all cases.
-
Thanks very much for the digging. I don't think we'll ever implement the optional encryption strategies and stick with just preserving what's there but we might add this to the docs. More importantly, I guess is a section on WorkProtection pointing out that this has nothing to do with security.
Can you confirm that you can now open files with a checkout of the 2.4 branch?
-
Confirmed.
- Log in to comment
Thanks for the report. Do you have a test file?