XXE vulnerability when loading workbook

Issue #749 resolved
AntiPaste created an issue

During a recent audit of our application it was discovered that a feature in which users can upload Excel files was vulnerable to XML External Entity processing. This allows the user to DoS the application and in certain cases retrieve any file from the server.

According to the Python documentation the standard library xml.etree does not resolve external entities. However, openpyxl uses lxml by default if it's installed, which does resolve external entities (by default) and thus causes this vulnerability.

I implemented a temporary fix by setting the OPENPYXL_LXML environment variable to false. I feel however that this should be fixed on openpyxl's side instead of having to remind developers to always disable openpyxl's lxml feature when handling user input.

A simple fix on the lxml side would be to define a parser that disables external entities and use that parser wherever necessary.

Example:

from lxml import etree

# Vulnerable
etree.fromstring(user_controlled_data)

# Not vulnerable
parser = etree.XMLParser(resolve_entities=False)
etree.fromstring(user_controlled_data, parser=parser)

The standard library's xml.etree.ElementTree.fromstring does not accept a parser keyword argument so care must be taken to preserve interchangeability.

I have attached an Excel file as a proof-of-concept that reads /dev/random, causing the parser to hang. The PoC file can be used as so:

from openpyxl import load_workbook
load_workbook(filename='test.xlsx')

Credit for the discovery goes to F-Secure.

Comments (5)

  1. CharlieC

    Thanks for the report, will look at changing the code accordingly. I think it shouldn't be too hard. It should be noted, however, that the standard library also says that the XML libraries are not hardened so in really sensitive situations you might want to avoid the library altogether.

  2. Log in to comment