Adding cache to time_series

Issue #2 new
David Larlet
created an issue

Here is my proposal, I did this for my project: {{{

!python

    while dt < end_date:
        CACHE_KEY = "CACHE_%s_%s_%s" % (dt, end_date, interval)
        result = cache.get(CACHE_KEY)
        if result is None:
            result = method(dt, date_field=date_field,
                            aggregate_field=aggregate_field,
                            aggregate_class=aggregate_class)
            if interval == "months":
                cache_timeout = 60*60*24*30 # 1 month
            elif interval == "weeks":
                cache_timeout = 60*60*24*7 # 1 week
            elif interval == "days":
                cache_timeout = 60*60*24 # 1 day
            elif interval == "hours":
                cache_timeout = 60*60 # 1 hour
            cache.set(CACHE_KEY, result, cache_timeout)
        stat_list.append((dt, result,))
        dt = dt + relativedelta(**{interval : 1})
    return stat_list

}}}

This way it doesn't refresh all graphs at each requests. Note that qs should probably be part of the CACHE_KEY (for my use I only have the same base queryset).

David

Comments (6)

  1. Mikhail Korobov repo owner

    Hmm. I like the approach. Some thoughts:

    • this should be optional because is assumes that past data doesn't change;
    • the last interval shouldn't be cached so it will be possible to observe e.g. growing monthly bar for current month.
  2. David Larlet reporter

    Here is a new version, not sure how you want to handle it in _fast_time_series:

        def time_series(self, start_date, end_date=None, interval='days', 
                        date_field=None, aggregate_field=None, aggregate_class=None, 
                        cached=False, engine='mysql'):
            end_date = end_date or self.today + datetime.timedelta(days=1)
            args = [start_date, end_date, interval, date_field, aggregate_field, aggregate_class, cached]
            try:
                #TODO: engine should be guessed
                return self._fast_time_series(*(args+[engine]))
            except (QuerySetStatsError, DatabaseError,):
                return self._slow_time_series(*args)
    
        def _slow_time_series(self, start_date, end_date, interval='days', 
                              date_field=None, aggregate_field=None, aggregate_class=None, 
                              cached=None):
            try:
                method = getattr(self, 'for_%s' % interval[:-1])
            except AttributeError:
                raise InvalidInterval('Interval not supported.')
    
            stat_list = []
            dt, end_date = _to_datetime(start_date), _to_datetime(end_date)
            while dt < end_date:
                CACHE_KEY = ("CACHE_%s_%s_%s" % (dt, end_date, interval)).replace(' ', '_')
                result = cached and cache.get(CACHE_KEY) or None
                next_dt = dt + relativedelta(**{interval : 1})
                if result is None:
                    result = method(dt, date_field=date_field,
                                    aggregate_field=aggregate_field,
                                    aggregate_class=aggregate_class)
                    # do not cache the latest value
                    if next_dt < end_date and cached:
                        if interval == "months":
                            cache_timeout = 60*60*24*30  # 1 month
                        elif interval == "weeks":
                            cache_timeout = 60*60*24*7  # 1 week
                        elif interval == "days":
                            cache_timeout = 60*60*24  # 1 day
                        elif interval == "hours":
                            cache_timeout = 60*60  # 1 hour
                        # result is stored as a string because there is a weird
                        # issue with locmem backend returning None when the value
                        # of the cache is 0
                        cache.set(CACHE_KEY, str(result), cache_timeout)
                else:
                    result = int(result)
                stat_list.append((dt, result,))
                dt = next_dt
            return stat_list
    
        def _fast_time_series(self, start_date, end_date, interval='days', 
                              date_field=None, aggregate_field=None, aggregate_class=None, 
                              cached=None, engine='mysql'):
    

    Hope it helps!

  3. Mikhail Korobov repo owner

    Hi David,

    thanks for working on this!

    As you mentioned, qs have to be a part of cache key.

    What if instead of cache=True we'll pass smth. like cache_key='foo' and then mix this key into cache key? I can't think of any reliable way to differentiate querysets automatically.

    As for fast_time_series, I think it should cache all the data but replace the last item with appropriate self.method call (as in _slow_time_series). This way logic will be consistent (and I think 1 cache hit + 1 small sql query will be significally faster than 1 big sql query in most cases).

  4. David Larlet reporter

    Hi,

    Very clever idea to set a unique cache_key instead of a boolean!

    About the fast_time_series, the issue is that I'll not be able to test that easily given my configuration. Can you do that part?

  5. David Larlet reporter

    Here is the new version:

        def time_series(self, start_date, end_date=None, interval='days', 
                        date_field=None, aggregate_field=None, aggregate_class=None, 
                        cache_key=None, engine='mysql'):
            end_date = end_date or self.today + datetime.timedelta(days=1)
            args = [start_date, end_date, interval, date_field, aggregate_field, aggregate_class, cache_key]
            try:
                #TODO: engine should be guessed
                return self._fast_time_series(*(args+[engine]))
            except (QuerySetStatsError, DatabaseError,):
                return self._slow_time_series(*args)
    
        def _slow_time_series(self, start_date, end_date, interval='days', 
                              date_field=None, aggregate_field=None, aggregate_class=None, 
                              cache_key=None):
            try:
                method = getattr(self, 'for_%s' % interval[:-1])
            except AttributeError:
                raise InvalidInterval('Interval not supported.')
    
            stat_list = []
            dt, end_date = _to_datetime(start_date), _to_datetime(end_date)
            while dt < end_date:
                CACHE_KEY = ("CACHE_%s_%s_%s_%s" % (cache_key, dt, end_date, interval)).replace(' ', '_')
                result = cache_key is not None and cache.get(CACHE_KEY) or None
                next_dt = dt + relativedelta(**{interval : 1})
                if result is None:
                    result = method(dt, date_field=date_field,
                                    aggregate_field=aggregate_field,
                                    aggregate_class=aggregate_class)
                    # do not cache the latest value
                    if next_dt < end_date and cache_key is not None:
                        if interval == "months":
                            cache_timeout = 60*60*24*30  # 1 month
                        elif interval == "weeks":
                            cache_timeout = 60*60*24*7  # 1 week
                        elif interval == "days":
                            cache_timeout = 60*60*24  # 1 day
                        elif interval == "hours":
                            cache_timeout = 60*60  # 1 hour
                        # result is stored as a string because there is a weird
                        # issue with locmem backend returning None when the value
                        # of the cache is 0
                        cache.set(CACHE_KEY, str(result), cache_timeout)
                else:
                    result = int(result)
                stat_list.append((dt, result,))
                dt = next_dt
            return stat_list
    
        def _fast_time_series(self, start_date, end_date, interval='days', 
                              date_field=None, aggregate_field=None, aggregate_class=None, 
                              cache_key=None, engine='mysql'):
    
  6. Log in to comment