Write a C extension to enhance performance

Issue #1323 resolved

Former user created an issue 2009-02-24

Hello,

I have a ~100000 rows table from which I am doing large queries. I have noticed that object creation is very slow (I use the declarative layer), so for large queries I have taken the habit of querying specific columns rather than the whole classes.

But even then, a code like:

q = db_session.query(Logement.id_groupe, Logement.nom_voie, Logement.cp)
q = q.enable_eagerloads(False)
r = q.all()

ends up 8x as slow (more than 8s. as opposed to a mere 1s.) as the following equivalent code:

r = db_session.execute(select([Logement.nom_voie, Logement.cp](Logement.id_groupe,)))

Yet they both generate the same SQL:

# ORM query
SELECT logement.id_groupe AS logement_id_groupe, logement.nom_voie AS logement_nom_voie, logement.cp AS logement_cp 
FROM logement
# ORM-less query
SELECT logement.id_groupe, logement.nom_voie, logement.cp 
FROM logement

I am really astonished that there is so much overhead. The query above is not difficult to convert to an ORM-less query, but more complicated ones will be tedious to convert...

(when executed from the MySQL prompt, the query takes 0.47s.)

Comments (33)

Former user Account Deleted
Hmm, actually my measurements seem wrong, in that the ORM-less query returns an object which yields results lazily. It means that in both cases, it takes 8s. to fetch all results, while under a MySQL prompt the query execution time displayed is 0.47s. The MySQL server runs locally and listens on an Unix socket.
- 2009-02-24T09:04:36+00:00

Former user Account Deleted

If I use MySQLdb directly, results return in roughly 1s.:

time python -c "import MySQLdb; c = MySQLdb.connect(host='localhost', user='XXX', passwd='XXX', db='XXX').cursor(); c.execute('select nom_voie, cp, id_groupe from logement'); c.fetchall()"
0.73user 0.12system 0:01.41elapsed 61%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+10185minor)pagefaults 0swaps

2009-02-24T09:22:50+00:00

Mike Bayer repo owner
- changed status to wontfix
Since there are no test cases or profiling data attached to this ticket, and since it appears the problem has resolved itself as you are observing similar speeds between ORM and non-ORM fetches, I don't see any issue here. 8 seconds sounds high but then again there may be overhead in fetching columns (your MySQLdb test doesn't do that) as well as any character encoding or other conversions being applied to the rows based on your Table definitions.

The expression language is quite fast, please see http://techspot.zzzeek.org/?p=17 .

As always if you can provide specifics, feel free to reopen.
- 2009-02-24T10:34:01+00:00
Former user Account Deleted
What do you mean exactly by "overhead in fetching columns"? The MySQLdb timing does fetchall(), which returns all the ~100000 results in a list.

My initial diagnosis (that the overhead is in the ORM layer) was false, but it remains that taking 8s for a task that takes 1s with the MySQLdb layer (for the same results: a list of tuples of strings and ints) is quite worrying to me. I was expecting to be able to use SQLAlchemy for moderately large queries. 100000 does not sound like an insane number of results as long as only tuples are created. The SQL expression layer gives me an abstraction over the specifics of each database engine, and I certainly don't want to lose that ;)

By the way, here is the relevant snippet of class definition:
```
class Logement(DeclarativeBase):
    id_groupe = Column(Integer, ForeignKey('groupe.id_groupe'), nullable=False)
    nom_voie = Column(Unicode(255), nullable=False)
    cp = Column(Unicode(5), nullable=False)
```
- 2009-02-25T06:22:45+00:00
Former user Account Deleted
(by the way, I can't find any way of having ticket updates e-mailed to me. I assume this is because of the "guest/guest" convention, and it doesn't seem I can register anywhere)
- 2009-02-25T06:40:26+00:00
Mike Bayer repo owner
Replying to guest:

What do you mean exactly by "overhead in fetching columns"? The MySQLdb timing does fetchall(), which returns all the ~100000 results in a list.

that does not necessarily fetch all columns from each row. SQLAlchemy may be applying translations to each column, which depends on your Table setup.

My initial diagnosis (that the overhead is in the ORM layer) was false, but it remains that taking 8s for a task that takes 1s with the MySQLdb layer (for the same results: a list of tuples of strings and ints) is quite worrying to me. I was expecting to be able to use SQLAlchemy for moderately large queries. 100000 does not sound like an insane number of results as long as only tuples are created. The SQL expression layer gives me an abstraction over the specifics of each database engine, and I certainly don't want to lose that ;)

try fetching each column from your MySQLdb result set, and convert each string to a python unicode object, to get a better sense for the work sqlalchemy is doing.

{{{ class Logement(DeclarativeBase): id_groupe = Column(Integer, ForeignKey('groupe.id_groupe'), nullable=False) nom_voie = Column(Unicode(255), nullable=False) cp = Column(Unicode(5), nullable=False) }}}

each Unicode column there will result in a fetch of the column's data, which is then decoded via value.decode('utf-8'). your mysqldb test does not do this.

(by the way, I can't find any way of having ticket updates e-mailed to me. I assume this is because of the "guest/guest" convention, and it doesn't seem I can register anywhere)

your email address can be placed as a "CC" on this ticket, just paste it here and I can add it to that field (you were able to add it when you first created the ticket but after that I dont think you get the privs for that).
- 2009-02-25T10:05:02+00:00

Former user Account Deleted

removed status
changed status to open

I finally found the time to produce a profile of the query. Here it is:

Fri Mar  6 19:39:36 2009    select.prof

         3357583 function calls (3357534 primitive calls) in 14.791 CPU seconds

   Ordered by: internal time
   List reduced from 301 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   324807    2.051    0.000    7.440    0.000 base.py:1607(_get_col)
   108270    1.651    0.000   13.871    0.000 query.py:1286(instances)
   324807    1.636    0.000    9.076    0.000 base.py:1347(__getitem__)
   324807    1.251    0.000   10.327    0.000 query.py:2061(proc)
   433076    1.231    0.000   11.559    0.000 query.py:1343(<genexpr>)
   324810    1.165    0.000    1.612    0.000 util.py:187(__hash__)
   216538    1.132    0.000    3.778    0.000 types.py:468(process)
   216541    0.952    0.000    2.223    0.000 {method 'decode' of 'str' objects}
   216541    0.698    0.000    1.270    0.000 utf_8.py:15(decode)
   216541    0.573    0.000    0.573    0.000 {_codecs.utf_8_decode}
   324810    0.447    0.000    0.447    0.000 {hash}
   216627    0.423    0.000    0.423    0.000 {isinstance}
        1    0.422    0.422    0.661    0.661 base.py:1633(fetchall)
        1    0.409    0.409    0.409    0.409 {built-in method fetch_row}
        1    0.260    0.260    0.260    0.260 {method 'store_result' of '_mysql.connection' objects}
   108269    0.239    0.000    0.239    0.000 base.py:1306(__init__)
        1    0.177    0.177   14.736   14.736 query.py:1180(all)
        1    0.055    0.055   14.791   14.791 <string>:1(<module>)
        1    0.006    0.006    0.006    0.006 {method 'query' of '_mysql.connection' objects}
        1    0.004    0.004    0.005    0.005 connections.py:44(__init__)

As you see, UTF-8 decoding is good for roughly 1s. out of the 14s. total (I suppose profiling adds its own overhead, which would explain while it's slower than the original 8s.). The strings which are fetched from the database are rather small (between 5 and 30 characters generally).

2009-03-06T14:03:45+00:00

Former user Account Deleted
By the way, could you please add <pitrou at free.fr> as cc to the issue? Thanks in advance.
- 2009-03-06T14:06:27+00:00

Former user Account Deleted

I misread the profile a bit. UTF-8 decoding amounts to 2.2s total.

A way of reducing the decoding overhead would be to lookup the raw decode method in advance (the same encoding is used for all rows):

python -m timeit -s "s='x'*10" "x = s.decode('utf8')"
100000 loops, best of 3: 2.83 usec per loop
python -m timeit -s "s='x'*10;import codecs;f=codecs.getdecoder('utf8')" "x, _ = f(s)"
1000000 loops, best of 3: 1.69 usec per loop

2009-03-06T14:16:08+00:00

Mike Bayer repo owner

changed watchers to pitrou@free.fr
changed component to engine
changed title to Enhance performance to C-like levels
changed milestone to blue sky
marked as enhancement

its just the overhead of Python, versus that of a native MySQLdb. If you examine the code for the areas taking the most time:

324807    1.636    0.000    9.076    0.000 base.py:1347(__getitem__)

def __getitem__(self, key):
   return self.__parent._get_col(self.__row, key)

324807    2.051    0.000    7.440    0.000 base.py:1607(_get_col)

def _get_col(self, row, key):
   try:
       type_, processor, index = self._props[key](key)
   except TypeError:
       # exception case that isnt happening here

   if processor:
       return processor(row[index](index))
   else:
       return row[index](index)

216538    1.132    0.000    3.778    0.000 types.py:468(process)

def process(value):
   if value is not None and not isinstance(value, unicode):
       return value.decode(dialect.encoding)
   else:
       return value

324807    1.251    0.000   10.327    0.000 query.py:2061(proc)

def proc(context, row):
    return row[column](column)


433076    1.231    0.000   11.559    0.000 query.py:1343(<genexpr>)
    rows = [row) for proc in process)
                          for row in fetch](rowtuple(proc(context,)

the rowtuple callable is:

    rowtuple = type.__new__(type, "RowTuple", (tuple,), labels)

The depth of stack for fetching individual columns is less than ten functions deep. Within each function, I don't see any unreasonable operations there. do you ?

The answer here would be to rewrite ResultProxy and some of the key TypeEngine classes in C.

2009-03-06T14:35:48+00:00

Former user Account Deleted
It seems it's just bytecode interpretation and method dispatch overhead, yes. Perhaps not all of the said classes have to be reimplemented in C, only the critical methods? The rest can remain implemented in Python in a derived class. (I'm not saying it's easy of course)
- 2009-03-07T06:37:21+00:00
Former user Account Deleted
(original author: ged) Hi. You might want to test the latest trunk, or 0.6 when it'll be out. I've put some time into optimizing all this. It should already be much faster. Unfortunately MySQL doesn't seem to be in the list of DB which are returning native unicode strings to bypass the decode step altogether.
- 2009-11-04T09:24:14+00:00
Former user Account Deleted
- attached c_extension.patch
(original author: ged) This patch contains a work in progress implementation of a C extension, containing a C base class for RowProxy, C base class for ResultProxy and C processors for the most common types
- 2009-12-29T11:20:18+00:00
Former user Account Deleted
- changed watchers to pitrou@free.fr, ged@bugfactory.org
- changed title to Write a C extension to enhance performance
(original author: ged) I only uploaded this patch because I'm currently unable to work on it at a decent pace, and don't want all that code to be wasted in case some wants a C extension quickly. I will finish it someday, but I can't give any estimate as to when.
- 2009-12-29T11:25:19+00:00
Former user Account Deleted
For what it's worth, I'm certainly unable to work on a C extension myself (I know nothing about sqlalchemy internals), but I can help if you have problems with the Python C API side of things. I'll also be at the next PyCon in Atlanta.

Antoine Pitrou.
- 2009-12-29T11:59:41+00:00
Former user Account Deleted
(original author: ged) Well, the C code is complete. If you want to help, it would be great if you could double check the C file I have written for obvious mistakes like reference count errors (this is my first extension so it is likely) or for a better way to do anything.
- 2009-12-29T12:20:03+00:00
Former user Account Deleted
Could you upload the patch to something like http://codereview.appspot.com/ ? It makes reviewing easier.
- 2009-12-30T07:45:14+00:00
Former user Account Deleted
- assigned issue to
- changed milestone to 0.6.0
(original author: ged)
- 2010-01-29T10:54:35+00:00
Former user Account Deleted
(original author: ged) The current version (but I am actively working on it): http://codereview.appspot.com/196062/show

What I would like to know is why pickle does not work. It currently segfaults whenever I try to access an unpickled instance of RowProxy (which inherits from the C type "BaseRowProxy"). The pickling/unpickling itself works fine.
- 2010-01-29T14:00:20+00:00
Former user Account Deleted
(original author: ged) Ok, nevermind. It was a stupid problem...
- 2010-01-29T15:46:40+00:00
Former user Account Deleted
- attached cextensions_v2.patch
(original author: ged) Here is the (hopefully) final version, I'll upload it to appreview too
- 2010-02-08T15:40:01+00:00
Former user Account Deleted
(original author: ged) Note that with the new patch all tests pass with and without the extension, except 2 callcount related ones with the extension and I didn't search how to fix them given the huge difference in counts between the two versions.
- 2010-02-08T15:41:59+00:00
Former user Account Deleted
(original author: ged) The new version is uploaded on codereview (http://codereview.appspot.com/196062/show). Mike, could you review it please? Thanks!
- 2010-02-08T15:46:29+00:00
Former user Account Deleted
- attached stress_all.py
(original author: ged) Benchmark code
- 2010-02-08T16:23:39+00:00
Former user Account Deleted
- attached stresstest.py
(original author: ged) Benchmark code: support file
- 2010-02-08T16:24:10+00:00
Former user Account Deleted
(original author: ged) One last comment before I leave: in the timings, do not compare times from one "type" test to the other, as some of them were too slow so I reduced the number of rows.

Happy testing all!
- 2010-02-08T16:26:26+00:00
Former user Account Deleted
- attached satimings.txt
(original author: ged) Results of benchmarks, including conlusions and % improvements
- 2010-02-09T07:03:54+00:00
Former user Account Deleted
- attached cextension_r6747.patch
(original author: ged) Patch updated for r6747 and cleaned up from the junk I left in the Python parts
- 2010-02-11T08:37:32+00:00
Former user Account Deleted
- attached cextension_r6747.2.patch
(original author: ged) Fix a little bug in _process_rows
- 2010-02-11T14:50:50+00:00
Mike Bayer repo owner
- attached cextension_r6756.patch
mike's reorganization
- 2010-02-13T12:29:41+00:00
Former user Account Deleted
(original author: ged) committed in 165609a190665f5453417c9c935a834714c7f5a5.
- 2010-02-13T17:54:57+00:00
Mike Bayer repo owner
- changed status to resolved
While we'd like to add more C extensions as time goes on, we now have a good start and its working fully, so this ticket is complete. For new C extensions like ORM elements lets open new tickets with specific goals.
- 2010-03-13T11:56:39+00:00
Mike Bayer repo owner
- removed milestone
Removing milestone: 0.6.0 (automated comment)
- 2014-02-23T18:14:38+00:00
Log in to comment

Assignee: –

Type: enhancement

Priority: major

Status: resolved

Component: engine

Milestone: –

Votes: 0

Watchers: 0