Memory leak on creating savepoints

Issue #3931 resolved
Bartłomiej Biernacki created an issue

I found a small memory leak when using many savepoints to the same DB.

Consider this code example:

session = create_session()

for _ in range(0, 5000):
    session.begin_nested()
    session.execute(
        sa.insert(Test).values(name='test'))
    session.commit()

session.commit()
session.close()

After this we can observe 5000 strings with savepoints names in memory.

Diging deeply into it I found that function sql/compiler.py:format_savepoint saves savepoint name to self._strings. By default savepoint names are unique for each nested transaction so saving them for future use is redundant and each new savepoint name stays in memory.

My proposition is to change the function, so savepoint name won't be saved:

    def format_savepoint(self, savepoint, name=None):
        value = name or savepoint.ident
        if self._requires_quotes(value):
            return self.quote_identifier(value)
        return value

Complete code example for reproducing bug in attachment.

Comments (5)

  1. Mike Bayer repo owner

    ah. savepoint_seq is local to a Connection. Was wondering why the world was not crashing down with this one. OK, this makes this a much less impactful issue, will hit it for 1.1 for now.

  2. Mike Bayer repo owner

    Don't cache savepoint identifiers

    Fixed bug in compiler where the string identifier of a savepoint would be cached in the identifier quoting dictionary; as these identifiers are arbitrary, a small memory leak could occur if a single :class:.Connection had an unbounded number of savepoints used, as well as if the savepoint clause constructs were used directly with an unbounded umber of savepoint names. The memory leak does not impact the vast majority of cases as normally the :class:.Connection, which renders savepoint names with a simple counter starting at "1", is used on a per-transaction or per-fixed-number-of-transactions basis before being discarded.

    The savepoint name in virtually all cases does not require quoting at all, however to support potential third party use cases the "check for quotes needed" logic is retained, at a small performance cost. Uncondtionally quoting the name is another option, but this would turn the name into a case sensitive name which runs the risk of poor interactions with existing deployments that may be looking at these names in other contexts.

    Change-Id: I6b53c96abf7fdf1840592bbca5da81347911844c Fixes: #3931

    → <<cset f4c4f784cde8>>

  3. Log in to comment