[PATCH] Optimize time and datetime types on SQLite

Issue #1599 resolved
Former user created an issue

(original reporter: ged) AFAICT, the format within the database should be always the same, hence there is no need for the conditional parts of the regexps. This patch provides a ~4% total speedup for 10 fiels, 1000 records, which is appreciable IMO. The question is: was it already the same format in previous SA versions?

Comments (17)

  1. Mike Bayer repo owner

    I wouldn't assume that necesssarily, as people may have been trying to read date/time objects that were placed in the sqlite database using str(date) as well.

    wouldn't it be faster to write just one C module for RowProxy instead of squeezing dry lemons throughout the source ? they're not that hard to write, just a little tedious.

  2. Former user Account Deleted

    (original author: ged) Replying to zzzeek:

    I wouldn't assume that necesssarily, as people may have been trying to read date/time objects that were placed in the sqlite database using str(date) as well.

    Well, in that case, I think we should introduce an option to the type, or even a new type for that. I don't want to have to "pay" for legacy databases even in ones which were only inserted into via SQLAlchemy.

    wouldn't it be faster to write just one C module for RowProxy instead of squeezing dry lemons throughout the source ? they're not that hard to write, just a little tedious.

    Possibly for someone who knows how to do it, not for me (I didn't take that much time on those little optimizations). Besides, I doubt the types will ever be translated to C, so the savings I got there will be kept after a C rewrite of RowProxy. IMO those lemons were not that dry (don't forget the timings are total time and on top of most of the other optimizations)... And FWIW, now my current optimization pass is over: I've "audited for speed " all the result_processors of SQLite, PostgreSQL, MySQL and Oracle. Finally, as I said, I will have a go at the C RowProxy, just not now.

  3. Mike Bayer repo owner
    • changed milestone to 0.6.0

    OK, well I don't want to leave SQLite users high and dry in case they need flexibility here, so can you organize the patch such that a. the __legacy_microseconds__ attribute is removed, since that is "legacy" anyway and b. the string template/regular expressions used for the bind_processor and result_processor may be overridden in the constructor for each of the three types. We also need a note in CHANGES and the wiki page for this.

  4. Former user Account Deleted
    • assigned issue to

    (original author: ged) Replying to zzzeek:

    OK, well I don't want to leave SQLite users high and dry in case they need flexibility here, so can you organize the patch such that a. the __legacy_microseconds__ attribute is removed, since that is "legacy" anyway and b. the string template/regular expressions used for the bind_processor and result_processor may be overridden in the constructor for each of the three types. We also need a note in CHANGES and the wiki page for this.

    Sure. That will have to wait for somewhere during next week though (Tuesday at the earliest).

  5. Former user Account Deleted

    (original author: ged) Here is a patch (hopefully) containing what you asked for. I'll update the wiki after the change is commited (whether this iteration or a subsequent one).

    A few thoughts on the patch: - As I said on IRC, I'm not sure the argument approach is practical because if someone wants a custom format he's likely to want it for all his fields of that type. A type inheriting from _SLDateTime seem more practical. Given the way I've implemented the patch, this should also be possible. The question is whether to recommend using that or the argument approach. - the storage_format might not be any more useful than the legacy_microseconds argument because it only allows for limited customization as all values must be used. Using a dict to format the string would solve that but is much slower. Is it worth it since nobody has asked for it yet? - I'm not sure what I've done in types.py is the way to go. - The patch also includes a small speed optimization on the result_processor

  6. Mike Bayer repo owner

    Replying to ged:

    Here is a patch (hopefully) containing what you asked for. I'll update the wiki after the change is commited (whether this iteration or a subsequent one).

    looks good so far

    A few thoughts on the patch: - As I said on IRC, I'm not sure the argument approach is practical because if someone wants a custom format he's likely to want it for all his fields of that type.

    they should just use a function:

    def mydate():
        return DATETIME(...args...)
    
    • the storage_format might not be any more useful than the legacy_microseconds argument because it only allows for limited customization as all values must be used. Using a dict to format the string would solve that but is much slower. Is it worth it since nobody has asked for it yet?

    I think its fine for now to assume people want to store all the fields in each case but i see your point.

    • I'm not sure what I've done in types.py is the way to go.

    oh, yeah. it's probably not. you'd want to put an adapt() method on the SQLite type itself.

  7. Former user Account Deleted

    (original author: ged) > > - I'm not sure what I've done in types.py is the way to go.

    oh, yeah. it's probably not. you'd want to put an adapt() method on the SQLite type itself.

    I'm probably missing something (I didn't dig very deep into the type adaptation mechanism) but if I do that, I can't use the generic types anymore if when I want to use some of those custom arguments since the init and adapt methods do not handle the new arguments. This seems logical so far, but if I use the SQLite-specific types directly, the adapt method on the specific type never gets called. So I don't see the point.

  8. Mike Bayer repo owner

    Right now we don't support "the generic type passes through **kw to the DB specific types". If you have DB-specific args, then you use the DB-specific type, and in 0.6 we've made this very easy:

    from sqlalchemy.dialects.sqlite import DATE, DATETIME, TIME, INTEGER, VARCHAR
    

    i.e. some of those come from types.py and some from base.py, you don't have to care.

    the idea that "well I'd like to be DB agnostic, but also have my DB-specific arguments" really doesn't work here, since then you have this **kw that you'd like to define a namespace over any number of backends, and then you have name collision issues (once people demand that this feature work in every possible edge case). We haven't gone with that whole sqlite_timeformat, mysql_collation naming scheme like we do in Table. There are easy receipes using TypeDecorator if you want to build up a type that does different things on different backends.

  9. Former user Account Deleted

    (original author: ged) Replying to zzzeek:

    Right now we don't support "the generic type passes through **kw to the DB specific types".

    Yes, I figured that, and it makes sense.

    the idea that "well I'd like to be DB agnostic, but also have my DB-specific arguments" really doesn't work here

    Never said it should, just pointed out that the adapt method on the specific type is never called AFAICT, hence I don't understand the point in defining it. Besides, using DATETIME does not work either without the changes I have done in types.py...

    from sqlalchemy import *
    
    metadata = MetaData('sqlite://')
    
    # this works
    from sqlalchemy.dialects.sqlite.base import _SLDateTime
    
    test_table = Table('test1', metadata,
        Column('id', Integer, primary_key=True),
        Column("dt",
               _SLDateTime(storage_format="%04d-%02d-%02d %02d:%02d:%02d.%06d"))
    )
    
    # and this does not
    from sqlalchemy.dialects.sqlite.base import DATETIME
    
    test_table = Table('test2', metadata,
        Column('id', Integer, primary_key=True),
        Column("dt", DATETIME(storage_format="%04d-%02d-%02d %02d:%02d:%02d.%06d"))
    )
    

    adding

    DATETIME = _SLDateTime
    

    somewhere in sqlite/base.py seems to fix the problem. Is it the correct solution?

  10. Mike Bayer repo owner

    err no if we are adding explicit args then we have to change the name of _SLDateTime and similar to be DATETIME, DATE, TIME, and ensure those are the types that are exported instead of those of types. The patch should also remove all modifications to types.py.

  11. Former user Account Deleted
    • removed status
    • changed status to open

    It might be a good idea to parse sqlite's "SELECT DATETIME('now')" (returns '2009-09-09 09:09:09' (no .00) as a valid datetime.

  12. Log in to comment