Unicode Filename causes S3Storage to crash

Issue #19 wontfix
Anonymous created an issue

I suspect S3 doesnt support unicode filenames, when using S3Storage on a file field and uploading a file with a unicode character in the file name, we observe a crash, otherwise it works as expected.

Sorry I don't have a crash log or time to look into it atm, just thought I would flag this up, and will try and send in a patch when I get a moment. Our current thoughts are to use the django slugify function to remove all unicode char's from the file name prior to sending it to S3

Mat

Comments (17)

  1. Mat Clayton

    Turns out S3 does support unicode, its an issue with S3.py and urllib.quote_plus not supporting unicode, dont have time at the moment to do any more work on this, but this info might help anyone who knows urllib/urllib2 to sort it out.

    Mat

  2. Kyle MacFarlane

    For me, django-storages has worked fine with unicode on S3. One of the tests in sorl-thumbnail creates the file "sorl-thumbnail-ążśź_source.jpg" and it works fine.

    However, the standard Django storage automatically removes any unicode but it's not very good and filenames in Japanese can end up being merely "____.jpg". I've had situations where the file saved to the disk has had all unicode replaced but the path in the database has not. Something is wrong in the core of Django I believe.

  3. Anonymous

    I also got this problem. here is the traceback:

    Traceback (most recent call last):
    
     File "/usr/lib/python2.5/site-packages/django/core/handlers/base.py", line 99, in get_response
       response = callback(request, *callback_args, **callback_kwargs)
    
     File "/usr/lib/python2.5/site-packages/django/utils/decorators.py", line 36, in __call__
       return self.decorator(self.func)(*args, **kwargs)
    
     File "/usr/lib/python2.5/site-packages/django/contrib/auth/decorators.py", line 24, in _wrapped_view
       return view_func(request, *args, **kwargs)
    
     File "/home/admin/django_projects/myproject/apps/myapp/views.py", line 516, in account_edit.save()
    
     File "/home/admin/django_projects/myproject/apps/myapp/models.py", line 346, in save
       super(Profile, self).save(force_insert, force_update)
    
     File "/usr/lib/python2.5/site-packages/django/db/models/base.py", line 419, in save
       self.save_base(force_insert=force_insert, force_update=force_update)
    
     File "/usr/lib/python2.5/site-packages/django/db/models/base.py", line 482, in save_base
       values = [(f, None, (raw and getattr(self, f.attname) or f.pre_save(self, False))) for f in non_pks]
    
     File "/usr/lib/python2.5/site-packages/django/db/models/fields/files.py", line 251, in pre_save
       file = super(FileField, self).pre_save(model_instance, add)
    
     File "/usr/lib/python2.5/site-packages/django/db/models/fields/__init__.py", line 181, in pre_save
       return getattr(model_instance, self.attname)
    
     File "/usr/lib/python2.5/site-packages/django/db/models/fields/files.py", line 192, in __get__
       file_copy = self.field.attr_class(instance, self.field, file.name)
    
     File "/home/admin/reusable_apps/django_thumbs/fields.py", line 29, in __init__
       setattr(self, 'url_%sx%s' % (w,h), get_size(self, size))
    
     File "/home/admin/reusable_apps/django_thumbs/fields.py", line 23, in get_size
       split = self.url.rsplit('.',1)
    
     File "/usr/lib/python2.5/site-packages/django/db/models/fields/files.py", line 69, in _get_url
       return self.storage.url(self.name)
    
     File "/home/admin/reusable_apps/storages/S3Storage.py", line 101, in url
       return self.generator.make_bare_url(self.bucket, name)
    
     File "/home/admin/reusable_apps/S3.py", line 386, in make_bare_url
       full_url = self.generate_url(self, bucket, key)
    
     File "/home/admin/reusable_apps/S3.py", line 398, in generate_url
       canonical_str = canonical_string(method, bucket, key, query_args, headers, expires)
    
     File "/home/admin/reusable_apps/S3.py", line 66, in canonical_string
       buf += "/%s" % urllib.quote_plus(key)
    
     File "/usr/lib/python2.5/urllib.py", line 1213, in quote_plus
       return quote(s, safe)
    
     File "/usr/lib/python2.5/urllib.py", line 1205, in quote
       res = map(safe_map.__getitem__, s)
    
    KeyError: u'\xf1'
    
  4. Anonymous

    Solved!

    Just add the following get_valid_name method to the S3Storage class. It uses django's slugify to convert/remove non-standard characters.

    def get_valid_name(self, name):
            from django.template.defaultfilters import slugify
            from django.utils.encoding import smart_str
            n = name.rsplit('.',1)[0]
            ext = name.rsplit('.',1)[1]
            n = smart_str(slugify(n).replace('-', '_'))
            return '%s.%s' % (n, ext)
    
  5. Antonio Melé

    Solved!

    Just add the following get_valid_name method to the S3Storage class. It uses django's slugify to convert/remove non-standard characters.

    def get_valid_name(self, name):
            from django.template.defaultfilters import slugify
            from django.utils.encoding import smart_str
            n = name.rsplit('.',1)[0]
            ext = name.rsplit('.',1)[1]
            n = smart_str(slugify(n).replace('-', '_'))
            return '%s.%s' % (n, ext)
    
  6. David Larlet repo owner

    I don't have time to test this with all possible cases (what if there is no '.', etc)

    Is there anybody following this issue who can verify the patch?

  7. Anonymous

    This needs some more testing ... slugify is not a good candidate here because it will replace / characters which are used to denote folders on S3.

  8. Michel Sabchuk

    I don't use S3 but use Rackspace Cloudfiles (mosso) and I have the same problem. As kylemacfarlane noted, the problem is that urllib's urlquote doesn't deal with unicode values, they must be converted to utf-8 before used.

    You can check it out on http://bugs.python.org/issue1712522 and on http://mail.python.org/pipermail/python-dev/2006-July/067248.html - both links talk about this issue of urllib.

    The cloudfiles API uses the standard urllib's quote method and it seems that S3 API uses it too (I didn't check it out).

    A fix to it is to pass the strings through smart_str before send it to the API. I can't fix it to S3 because I don't have an S3 account to test :) - Anyway, I pretend to fix it for the Rackspace Cloudfiles, someone could fix for the S3 after that.

    Best regards.

  9. Anonymous

    I ran into this issue.

    @tomwys fix worked for me.

    The offending file name for me was something like:

    loremipsum_112_©Lorem_Copy.jpg

  10. Log in to comment