is_date_format still gives wrong false positives

Issue #904 resolved
Carli
created an issue

As described in #124 is_date_format gives false positives. My custom format is: "Y: "0.00"m";"Y: "-0.00"m";"Y: <num>m";@ where m stands for meters (SI-Unit).

As the given number 5334664.86 is to large, it will fail to load the sheet.

So better to remove constants first:

DATE_INDICATORS = 'dmyhs'
COLORS = "(\[BLACK|BLUE|CYAN|GREEN|MAGENTA|RED|WHITE|YELLOW\])"
BAD_DATE_RE = re.compile(r'(\[{0}\])|((?<=)#).*[dmhys]+.*#?'.format(COLORS), re.IGNORECASE + re.UNICODE)
IGNORE_CONSTANTS = re.compile(r'"[^"]*"',re.UNICODE)

def is_date_format(fmt):
    if fmt is None:
        return False
    fmt = fmt.lower()
    fmt = IGNORE_CONSTANTS.sub("",fmt)
    if any([x in fmt for x in DATE_INDICATORS]):
        return not BAD_DATE_RE.search(fmt)
    return False

(Sorry that I don't do a pull request - I am behind a restrictive firewall and can't user hg or git)

Comments (4)

  1. CharlieC

    re. out of bounds datetimes there was another bug related to this: they're out of spec so an exception is the way to handle these: whatever produced the files needs to format them correctly.

    The regex looks a bit greedy.

  2. Carli reporter

    Well the regex removes all constants, can't think of a less greedy version :-/. If it for some reason, removes too much, I am with the creator of #124, that not formatting a date as date is better as formatting a number wrongly as date. As this is hard to revert for the user of the lib and can prevent users (like me) from loading their files at all.

  3. CharlieC

    I don't see a problem with 2.5 here.

    wb = load_workbook("Issues/bug904.xlsx")
    ws = wb.active
    c = ws['A1']
    c.value
    5334664.86
    c.number_format
    '"Y: "0.00"m";"Y: "\\-0.00"m";"Y: <num>m";@'
    

    Ideally applications would use the dedicated type to avoid all this nonsense, as is possible with openpyxl 2.5 because relying on parsing the format to find out is asking for trouble. However, rather than playing with regexes, it would be best implement a parser for the ABNF specification.

  4. Log in to comment