1. Manfred Moitzi
  2. svgwrite
  3. Issues
Issue #7 open

Parsing of paths is very time-consuming

Gareth Rees
created an issue

In the interactive session quoted below, I create a drawing containing one element, a path with 5,001 commands. It takes nearly a minute to save.

Python 3.3.0 (default, Nov 23 2012, 10:26:01) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import svgwrite
>>> d = svgwrite.Drawing(filename='test.svg')
>>> p = d.path()
>>> d.add(p)
<svgwrite.path.Path object at 0x124deadd0>
>>> p.push(['M', 0, 0])
>>> for i in range(5000):
...     p.push(['L', i, 0])
... 
>>> from timeit import timeit
>>> timeit(d.save, number=1)
59.46839178504888

With debug=False, saving is essentially instant (0.04 seconds). So clearly something is going wrong in the validation of the path. I'm guessing that the path parser has the wrong complexity: perhaps all those ZeroOrMore productions end up getting backtracked?

Comments (6)

  1. Lawrence Tattrie

    Using python 2.7 and an Intel i7 3GHz, the debug=true save, shown below, takes only 2.1 seconds so performance of the debug is great for me. A different plot, not shown, with 10,000 points and debug=True took 14 seconds to save which is more time but good to me.

    import svgwrite
    import time
    progname = 'test_save_perf'
    def create_svg(name):
        #dwg = svgwrite.Drawing(filename=progname+'.svg', debug=False)
        dwg = svgwrite.Drawing(filename=progname+'.svg', debug=True)
        p = dwg.path()
        dwg.add(p)
        p.push(['M', 0, 0])
        for i in range(5000):
            p.push(['L', i, 0])
        starttime=time.time()
        dwg.save()
        endtime=time.time()
        elapsedtime=round(endtime - starttime,1)
        print ("Script "+progname+" ended at "+time.asctime()+". Elapsed time is "+str(elapsedtime)+" seconds.")
    if __name__ == '__main__':
        create_svg(progname + '.svg')
    

    ```

  2. Gareth Rees reporter
    • changed status to open

    It may be a "performance problem", but that doesn't mean it's not a real problem. I discovered this problem when I tried to save a drawing containing a very long path element (tens of thousands of lineto commands). With such a long path element, saving the drawing did not complete in a reasonable length of time (I gave it several minutes) and the memory usage of the Python process kept growing, apparently without bounds. It was also unclear to me what I could do about it: in particular, it wasn't at all obvious from the documentation that this was caused by the default debug=True setting. I had to create and study a PDB backtrace to figure out what was going on.

    There are three easy changes you could make that would improve things for users in my original position:

    1. Change the default from debug=True to debug=False. Then at least users would not find themselves in a position where they cannot save their drawings.

    2. Put a note in the documentation for Drawing.save(), explaining that if debug=True (the default), then the drawing gets validated before being saved, and that this might be time-consuming. Then a user in my situation could read the documentation and figure out how to get their drawing to save in a reasonable amount of time.

    3. Avoid backtracking in the parsing, as described below.

    In svgwrite/data/svgparser.py, you have the lines:

    drawto_commands = drawto_command + ZeroOrMore(drawto_command)
    moveto_drawto_command_group = moveto + ZeroOrMore(drawto_commands)
    

    This grammar is ambiguous: a list of drawto_command nonterminals can be parsed in exponentially many different ways, depending on how the drawto_command nonterminals are grouped into drawto_commands nonterminals.

    If you made the grammar non-ambiguous, by removing the drawto_commands nonterminal, and defining the moveto_drawto_command_group nonterminal like this:

    moveto_drawto_command_group = moveto + ZeroOrMore(drawto_command)
    

    then validation would still be slow, but at least the time taken to parse a path would be proportional to the length of the path. After making this change, validation of paths with tens of thousands of commands can be completed on my machine in a few seconds, which is acceptable.

    I would still recommend putting a note in the documentation, though: even a few seconds to save a drawing is still surprisingly long, and users might reasonably wonder why it is happening, and want to know how to speed it up.

  3. Log in to comment