1. David Larlet
  2. django-storages
  3. Issues

Issues

Issue #186 new

The size property is eating gigs of ram

Brent O'Connor
created an issue

If you try to use a models.FileField's size property when using a large S3 bucket with boto==2.9.5 and django-storages==1.1.8, it ends up taking over 6 minutes and uses more than 7.5 GB of ram.

My model:

class Document(BaseMediaModel):

    def get_thumbnail_path(instance, filename):
        return get_file_upload_path(
            instance, filename,
            MEDIA_DOCUMENT_THUMBNAIL_BASE_DIR, instance.slug)

    def get_doc_path(instance, filename):
        return get_file_upload_path(
            instance, filename, MEDIA_DOCUMENT_BASE_DIR, instance.slug)

    thumbnail = ThumbnailerImageField(
        upload_to=get_thumbnail_path, blank=True)
    document = models.FileField(upload_to=get_doc_path)
    size = models.IntegerField(blank=True, null=True)

My test script doc_test.py:

from thelonious.apps.media.models import Document
doc = Document.objects.get(id=16)
doc.document.size

How long it takes to run:

$ time python doc_test.py 

real    6m6.696s
user    2m48.631s
sys     0m17.765s

Comments (4)

  1. Ian Lewis

    This is probably an issue with boto (or maybe how we use boto) as django-storages is just calling boto's native methods.

    If you do something like the following, how long does it take?

    from boto.s3.connection import S3Connection
    from boto.s3.key import Key as S3Key
    
    conn = S3Connection(AWS_ID, AWS_SECRET)
    bucket = conn.get_bucket(BUCKET_NAME)
    k = self.bucket.new_key(FILE_NAME)
    print k.exists()
    
  2. Brent O'Connor reporter

    I put the following in a test.py file on my laptop.

    from django.conf import settings
    from boto.s3.connection import S3Connection
    from thelonious.apps.media.models import Document
    
    doc = Document.objects.get(id=16)
    conn = S3Connection(settings.AWS_ACCESS_KEY_ID, settings.AWS_SECRET_ACCESS_KEY)
    bucket = conn.get_bucket(settings.AWS_STORAGE_BUCKET_NAME)
    file_name = 'media/{}'.format(doc.document.name)
    k = bucket.get_key(file_name)
    print(k.exists())
    print(k.size)
    

    Which outputs ...

    $ time python test.py
    True
    512243
    python test.py  0.80s user 0.21s system 69% cpu 1.442 total
    
  3. Log in to comment