During a recent audit of our application it was discovered that a feature in which users can upload Excel files was vulnerable to XML External Entity processing. This allows the user to DoS the application and in certain cases retrieve any file from the server.
According to the Python documentation the standard library
xml.etree does not resolve external entities. However, openpyxl uses lxml by default if it's installed, which does resolve external entities (by default) and thus causes this vulnerability.
I implemented a temporary fix by setting the
OPENPYXL_LXML environment variable to false. I feel however that this should be fixed on openpyxl's side instead of having to remind developers to always disable openpyxl's lxml feature when handling user input.
A simple fix on the lxml side would be to define a parser that disables external entities and use that parser wherever necessary.
from lxml import etree # Vulnerable etree.fromstring(user_controlled_data) # Not vulnerable parser = etree.XMLParser(resolve_entities=False) etree.fromstring(user_controlled_data, parser=parser)
The standard library's
xml.etree.ElementTree.fromstring does not accept a parser keyword argument so care must be taken to preserve interchangeability.
I have attached an Excel file as a proof-of-concept that reads
/dev/random, causing the parser to hang. The PoC file can be used as so:
from openpyxl import load_workbook load_workbook(filename='test.xlsx')
Credit for the discovery goes to F-Secure.