Openpyxl 2.0+ truncation of floats causes loss of precision in datetime

Issue #405 wontfix
Michael Barrientos created an issue
import openpyxl
import datetime
from io import BytesIO

wb = openpyxl.Workbook()
ws = wb.get_active_sheet()
ws['A1'] = datetime.datetime(2012, 8, 31, 6, 35)
with BytesIO() as f:
    wb.save(f)
    f.seek(0)
    wb_read = openpyxl.load_workbook(f)
print wb_read.get_active_sheet()['A1'].value

Under openpyxl 1.8.6, the output is:
2012-08-31 06:35:00

Under openpyxl 2.1.4, the output is:
2012-08-31 06:34:59.999999

After tracing through the code, this stems directly from the loss of precision in floats when saving in openpyxl >= 2.0+. Specifically: https://bitbucket.org/openpyxl/openpyxl/src/7a26e9a469426191a8475d53203bed57bdcaff40/openpyxl/compat/strings.py?at=default#cl-50

def safe_string(value):
    """Safely and consistently format numeric values"""
    if isinstance(value, NUMERIC_TYPES):
        value = "%.16g" % value
    ...

The %.16g to truncate a float output to 16 characters changes this specific value from:

<c r="A1" t="n"><v>41152.274305555555</v></c>

in 1.8.6 to the following in 2.1.4:

<c r="A1" t="n"><v>41152.27430555555</v></c>

(Notice the loss of a single repeating 5.)

1.8.6 just used the straight repr of the number, while 2.1.4 forces it through safe_string, which truncates from 7 repeating 5s to 6 such digits. Note that this is definitely during the save; the following code in 2.1.4 has seven "5s" in the output:

import openpyxl.date_time
import datetime
print openpyxl.date_time.to_excel(datetime.datetime(2012, 8, 31, 6, 35))
# 41152.274305555555

I'd like to put in a pull request to make %.17g the default, but don't know the reasoning behind this specific truncation (I see it even used to be %.15g before the most recent commit to strings.py), so I'll leave it to a core developer to look at.

Comments (7)

  1. Michael Barrientos reporter

    Also note that saving in 2.1.4 and reading in 1.8.6 results in "2012-08-31 06:34:59.999999" being output, while saving in 1.8.6 and reading in 2.1.4 results in the expected "2012-08-31 06:35:00". Definitely a writer problem.

  2. CharlieC

    The reason for safe_string is for consistency across different Python versions with the repr for floats changing between 2 & 3. As the specification does not mandate any particular degree of precision, accuracy cannot be guaranteed (Excel itself ignores anything less than a second in dates) so your 1.8.6 example will work sometimes and not others and will definitely vary between Python versions.

    Happy to add more degrees of precision though I'd actually like to switch to using ISO formatting, which is actually allowed in the specification but not particularly interoperable. :-(

  3. CharlieC

    If you make a pull request please base it on the 2.2 branch.

    Neither OpenOffice nor LibreOffice support the "d" cell type, unfortunately as it would solve problems like this. I've submitted a bug but I'm not very hopeful.

  4. CharlieC

    No pull request forthcoming and the proposed change causes problems elsewhere. The real issue is associated with Excel's default handling of datetimes.

  5. Log in to comment