bulk update of exchanges is slow with peewee

Issue #46 new
Tomas Navarrete Gutierrez
created an issue

If I want to update all the amounts of the inputs of an activity (with different values), I would have to iterate over the technosphere of the activity, update the amount field per exchange, and call "save()" on each exchange. With n technosphere exchanges, this requires n transactions, in current implementation.

I would like to group the updates, so that there is only one transaction, and hopefully less I/O to hardrive.

The idea from: http://docs.peewee-orm.com/en/latest/peewee/querying.html#atomic-updates

something in the line of:

"""
exchanges is a dictionary of oldExchange:newExchange
"""
def bulk_update_exchanges(activity, exchanges):
    with db.atomic as txn:
        for old, new in exchanges.items():
            old.update(data = new.data)

Comments (3)

  1. Tomas Navarrete Gutierrez reporter

    So, I found a quick hack to what I wanted to achieve, but I am not sure this is the right way to go. Specially, since this is not "generic" at all

    from brightway2 import *
    from bw2data.backends.peewee import sqlite3_lci_db as db 
    import random
    
    # ... project, db, activity finding
    act = Database('my_db').get('myActivity')
    
    with db.atomic() as txn:
        for e in act.technosphere():
            v = random.random()
            e_ds = e._document
            e_ds.data.update(amount = v)
            e.save()
    

    Of course the raw import of the sqlite_lci_db can be done in a more elegant way, depending on the type of backend.

    My doubt is specially on the need to recover the underlying document (ExchangeDataset) for the object (Exchange).

    It seems to work for now, but my request remains. ;)

  2. Chris Mutel repo owner

    Yes, this is a weakness of the current model of abstraction layer cake. Actually, I think your approach is quite reasonable, though you could do something directly with ActivityDataset objects using normal Peewee methods, e.g. what actually happens when you call .get().

    The problem with ActivityDataset and ExchangeDataset is that there actually isn't any foreign keys or other automatic relationships between them. So you will have to manage these yourself, and make sure you don't create mismatches between the tables. As you have seen, you can also gain some speed by dropping down to straight SQL from Python, but this only really makes sense in special circumstances.

  3. Log in to comment