storage.exists() + S3Boto, Storages is making a ton of requests and downloading large amounts of XML

Issue #178 new
Brent O'Connor created an issue

When using storage.exists() + S3Boto, I ran into a big issue with easy-thumbnails. When saving several different thumbnails for just one file, the memory for my python process would grow to 2.75 GB. I tracked this down to storage.exists() making tons of requests and downloading large amounts of data. This is because I'm using a bucket on S3 with over 300K objects.

I submitted a patch to easy-thumbnails here,

Is there a way to optimize Storages, so storage.exists() would work faster and not use large amounts of data?

Comments (5)

  1. Ian Lewis

    If your files don't change then you can use AWS_PRELOAD_METADATA=True in your settings to keep a cache of the file names so that storages can check if the file exists or not.

    Currently the cache doesn't update so it's not good for buckets where you are constantly adding/updating/deleting files. :-/

  2. Brent O'Connor reporter


    Good to know. However, that doesn't really apply here because items are getting updated by the second and it's not really a solution.

  3. Ian Lewis

    Yah, I think until we update storages to keep a local cache of the files and file names it's not going to get any better.

    Are you updating files via Django/storages or via something else?

  4. Log in to comment