[PATCH] Optimize time and datetime types on SQLite
(original reporter: ged) AFAICT, the format within the database should be always the same, hence there is no need for the conditional parts of the regexps. This patch provides a ~4% total speedup for 10 fiels, 1000 records, which is appreciable IMO. The question is: was it already the same format in previous SA versions?
Comments (17)
-
repo owner -
Account Deleted (original author: ged) Replying to zzzeek:
I wouldn't assume that necesssarily, as people may have been trying to read date/time objects that were placed in the sqlite database using str(date) as well.
Well, in that case, I think we should introduce an option to the type, or even a new type for that. I don't want to have to "pay" for legacy databases even in ones which were only inserted into via SQLAlchemy.
wouldn't it be faster to write just one C module for RowProxy instead of squeezing dry lemons throughout the source ? they're not that hard to write, just a little tedious.
Possibly for someone who knows how to do it, not for me (I didn't take that much time on those little optimizations). Besides, I doubt the types will ever be translated to C, so the savings I got there will be kept after a C rewrite of RowProxy. IMO those lemons were not that dry (don't forget the timings are total time and on top of most of the other optimizations)... And FWIW, now my current optimization pass is over: I've "audited for speed " all the result_processors of SQLite, PostgreSQL, MySQL and Oracle. Finally, as I said, I will have a go at the C RowProxy, just not now.
-
repo owner - changed milestone to 0.6.0
OK, well I don't want to leave SQLite users high and dry in case they need flexibility here, so can you organize the patch such that a. the
__legacy_microseconds__
attribute is removed, since that is "legacy" anyway and b. the string template/regular expressions used for thebind_processor
andresult_processor
may be overridden in the constructor for each of the three types. We also need a note in CHANGES and the wiki page for this. -
Account Deleted - assigned issue to
(original author: ged) Replying to zzzeek:
OK, well I don't want to leave SQLite users high and dry in case they need flexibility here, so can you organize the patch such that a. the
__legacy_microseconds__
attribute is removed, since that is "legacy" anyway and b. the string template/regular expressions used for thebind_processor
andresult_processor
may be overridden in the constructor for each of the three types. We also need a note in CHANGES and the wiki page for this.Sure. That will have to wait for somewhere during next week though (Tuesday at the earliest).
-
Account Deleted (original author: ged) Here is a patch (hopefully) containing what you asked for. I'll update the wiki after the change is commited (whether this iteration or a subsequent one).
A few thoughts on the patch: - As I said on IRC, I'm not sure the argument approach is practical because if someone wants a custom format he's likely to want it for all his fields of that type. A type inheriting from _SLDateTime seem more practical. Given the way I've implemented the patch, this should also be possible. The question is whether to recommend using that or the argument approach. - the storage_format might not be any more useful than the legacy_microseconds argument because it only allows for limited customization as all values must be used. Using a dict to format the string would solve that but is much slower. Is it worth it since nobody has asked for it yet? - I'm not sure what I've done in types.py is the way to go. - The patch also includes a small speed optimization on the result_processor
-
repo owner Replying to ged:
Here is a patch (hopefully) containing what you asked for. I'll update the wiki after the change is commited (whether this iteration or a subsequent one).
looks good so far
A few thoughts on the patch: - As I said on IRC, I'm not sure the argument approach is practical because if someone wants a custom format he's likely to want it for all his fields of that type.
they should just use a function:
def mydate(): return DATETIME(...args...)
- the storage_format might not be any more useful than the legacy_microseconds argument because it only allows for limited customization as all values must be used. Using a dict to format the string would solve that but is much slower. Is it worth it since nobody has asked for it yet?
I think its fine for now to assume people want to store all the fields in each case but i see your point.
- I'm not sure what I've done in types.py is the way to go.
oh, yeah. it's probably not. you'd want to put an adapt() method on the SQLite type itself.
-
Account Deleted (original author: ged) > > - I'm not sure what I've done in types.py is the way to go.
oh, yeah. it's probably not. you'd want to put an adapt() method on the SQLite type itself.
I'm probably missing something (I didn't dig very deep into the type adaptation mechanism) but if I do that, I can't use the generic types anymore if when I want to use some of those custom arguments since the init and adapt methods do not handle the new arguments. This seems logical so far, but if I use the SQLite-specific types directly, the adapt method on the specific type never gets called. So I don't see the point.
-
repo owner Right now we don't support "the generic type passes through **kw to the DB specific types". If you have DB-specific args, then you use the DB-specific type, and in 0.6 we've made this very easy:
from sqlalchemy.dialects.sqlite import DATE, DATETIME, TIME, INTEGER, VARCHAR
i.e. some of those come from types.py and some from base.py, you don't have to care.
the idea that "well I'd like to be DB agnostic, but also have my DB-specific arguments" really doesn't work here, since then you have this
**kw
that you'd like to define a namespace over any number of backends, and then you have name collision issues (once people demand that this feature work in every possible edge case). We haven't gone with that wholesqlite_timeformat
,mysql_collation
naming scheme like we do inTable
. There are easy receipes usingTypeDecorator
if you want to build up a type that does different things on different backends. -
Account Deleted (original author: ged) Replying to zzzeek:
Right now we don't support "the generic type passes through **kw to the DB specific types".
Yes, I figured that, and it makes sense.
the idea that "well I'd like to be DB agnostic, but also have my DB-specific arguments" really doesn't work here
Never said it should, just pointed out that the adapt method on the specific type is never called AFAICT, hence I don't understand the point in defining it. Besides, using DATETIME does not work either without the changes I have done in types.py...
from sqlalchemy import * metadata = MetaData('sqlite://') # this works from sqlalchemy.dialects.sqlite.base import _SLDateTime test_table = Table('test1', metadata, Column('id', Integer, primary_key=True), Column("dt", _SLDateTime(storage_format="%04d-%02d-%02d %02d:%02d:%02d.%06d")) ) # and this does not from sqlalchemy.dialects.sqlite.base import DATETIME test_table = Table('test2', metadata, Column('id', Integer, primary_key=True), Column("dt", DATETIME(storage_format="%04d-%02d-%02d %02d:%02d:%02d.%06d")) )
adding
DATETIME = _SLDateTime
somewhere in sqlite/base.py seems to fix the problem. Is it the correct solution?
-
repo owner err no if we are adding explicit args then we have to change the name of
_SLDateTime
and similar to beDATETIME
,DATE
,TIME
, and ensure those are the types that are exported instead of those oftypes
. The patch should also remove all modifications totypes.py
. -
Account Deleted - attached sqlite_datetimeopt2.patch
(original author: ged) New version with no modification in types.py
-
Account Deleted (original author: ged) Does this new version looks good to you?
-
repo owner looks fantastic.
-
Account Deleted - changed status to resolved
(original author: ged) committed in 1bca0c42a3f4626f0273e61cf1b59ece37c3bd26.
-
Account Deleted - removed status
- changed status to open
It might be a good idea to parse sqlite's "SELECT DATETIME('now')" (returns '2009-09-09 09:09:09' (no .00) as a valid datetime.
-
Account Deleted - changed status to resolved
(original author: ged) reverted part of the change in 26edef4f2483f68d6a527684a6d0aed37554534d.
-
repo owner - removed milestone
Removing milestone: 0.6.0 (automated comment)
- Log in to comment
I wouldn't assume that necesssarily, as people may have been trying to read date/time objects that were placed in the sqlite database using str(date) as well.
wouldn't it be faster to write just one C module for RowProxy instead of squeezing dry lemons throughout the source ? they're not that hard to write, just a little tedious.