1. David Larlet
  2. django-storages
  3. Issues

Issues

Issue #167 new

collectstatic is not overwriting changed images

Anonymous created an issue

Everytime I replace an image with another with the same name, collectstatic does not replace it on my S3, no matter how I configure my Django settings. Here's my current related attributes:

AWS_S3_FILE_OVERWRITE = True
AWS_QUERYSTRING_AUTH = False
AWS_PRELOAD_METADATA = True

INSTALLED_APPS += (
   'storages',
)

DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
STATICFILES_STORAGE = 'storages.backends.s3boto.S3BotoStorage'

AWS_STORAGE_BUCKET_NAME = 'my_statics'
STATIC_URL = "http://%s.s3.amazonaws.com/" % AWS_STORAGE_BUCKET_NAME

AWS_ACCESS_KEY_ID = 'FOO'
AWS_SECRET_ACCESS_KEY = 'BAR'

DAY = 86400  # 24h
AWS_HEADERS = {
   'Cache-Control': 'max-age=%d' % (7 * DAY), 
}

Comments (17)

  1. Ian Lewis

    I think this is actually not the best solution as there are a number of timezones that are in play. i.e. The system timezone, the TIME_ZONE setting, the timezone as returned from S3.

    The PR uses the TIME_ZONE setting which can be different from the system timezone which is what the FileSystemStorage uses. The django docs explicitly state that it the system timezone and TIME_ZONE setting could be different (they do need to be the same on Windows though).

    I sent a message to the django-users mailing list to see if there is any consensus on which timezone to use. I have a feeling that the FileSystemStorage is going to be a de facto standard and that the system timezone is actually what we should be using.

    https://groups.google.com/forum/#!topic/django-users/pEIKxsHYN74

  2. gentro

    Okay here's another wrinkle. I assume many people run collectstatic after pushing to a production environment with Git. If there are collaborators on a project in different time zones, the modified time on the post-merge files will reflect the system time on the computer of whoever made the latest commit. As the Git wiki says, "Git sets the current time as the timestamp on every file it modifies". So even if you convert the S3 UTC time to system time on the production machine, that may still not tell you whether the source is newer than the target because you don't know what time zone the source file was created in.

    It's only an issue if the collaborators are in different time zones. I don't know how often that happens on a Django project, but it's not a weird use case.

  3. gentro

    Actually, I was thinking more about this and I take back what I said. File metadata will always have the modified time in epoch seconds, so it'll always be accurate even if there are different versions getting passed around and merged in a vcs.

    That said, the system time based pull request Bob Spryn has now seems like the right solution. It'd be interesting to hear if the mailing list users have an opinion on where the TIME_ZONE setting should be used, but in this particular case turning the S3 timestamp to system time is the correct fix given that FileSystemStorage converts the source file's modified time to system time using datetime.fromtimestamp().

    Hopefully we can get this merged soon. Having to set --clear on collectstatic to get files to upload makes the process unreasonably long.

  4. jaylett

    I'm currently working on a documentation patch for Django (checked with a core dev) that will clarify the Storage API behaviour as returning naive datetime objects based on the "system" (os.environ['TZ']) timezone. (Ideally we'll also come up with a better API for the future with a good BC story.)

  5. Tobias McNulty

    There is a related issue here which is that modified_time() returns the result in UTC rather than the local timezone, which it should to remain compatible with other storage backends.

    Also, it will need to be modified to return timezone-aware datetimes in case USE_TZ is set to True (while leaving them naive if False, the default if it's not set).

    Here's a workaround I'm using in the meantime:

    import pytz
    from django.utils import timezone
    
    
    class TimezoneFixMixin(object):
    
        def modified_time(self, name):
            time = super(TimezoneFixMixin, self).modified_time(name)
            # make timezone-aware (S3 returns UTC)
            time = pytz.utc.localize(time)
            # convert to Django's timezone
            time = time.astimezone(timezone.get_current_timezone())
            # convert back to a naive datetime (what Django is expecting, at least
            # so long as USE_TZ remains False)
            return time.replace(tzinfo=None)
    
  6. jaylett

    Tobias — that's what should be the situation, but right now the core storage engines don't work like that. Compatibility with core storage backends right now requires a naive result in the local timezone; other storage backends are going against the documentation as of 1.8. Of course that means that really we need a patch here that changes all the django-storages backends rather than just one of them.

    The core behaviour with USE_TZ isn't particularly helpful (it still uses naive datetimes), and I'm hoping to get an approach agreed to this with the core developers so we can move to using aware datetimes when timezone support is enabled, and naive otherwise. Getting a sensible BC story that isn't incredibly complicated has proved challenging :-(

  7. jaylett

    Note that with Django 1.10, the Storage API is changing to support timezones (if USE_TZ=True). It will be bc (and with normal a deprecation timeline) with anything that currently returns naive datetimes in the local timezone, but at that point the implementation in storages should change. (I am hoping I'll have time to provide a patch, at least for -redux, if no one else gets there first.)

  8. Log in to comment