Trying to parse csv file from django-storages causes error

Issue #80 new
Former user created an issue

I am trying to parse csv files that have been uploaded to Amazon S3 using django-storages. I keep getting a "Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?". The normal work around for this is to open the file with "rU", but that does not seem to work with django storages. If I drop the file directly on the server and open from there it works, I just want to avoid storing the files directly on the server if possible. Here is the code I am using:

{{{ #!python import csv from django.core.files.storage import default_storage as s3_storage n = 'csvdumps/130331548894.csv' csvf = s3_storage.open(n, "rU") csvReader = csv.reader(csvf) for item in csvReader: print item }}}

Comments (9)

  1. rgcarrasqueira

    Hi i'm trying to do the same thing, but also trying to get the 1st line to check if the column labels is there.

    I'm doing: header = reader.next()

    But I'm receiving a error about StopInteration Exception.

    Any Idea?

  2. Ian Lewis

    @rgcarrasqueira Is it possible the file is empty? next() causing a StopIteration exception is expected behavior when there are no more lines to read.

  3. rgcarrasqueira

    Hi Ian,

    Thanks for your reply. The problem is the csv file read using django-storages are read in a one single line, for example:

    'nome;e-mail\r\nale martins 1;aleze2004@hotmail.com\r\nale martins 2;alessandro@zemoleza.com.br\r\nale alves 1;alessandro@lvinterativa.com.br\r\nale alves 2;alessandro@xleads.com.br'

    The reading operation from s3 does not make distintion between line breaks. Any Ideas?

    Thanks!

    Rogério Carrasqueira

    Rogério Carrasqueira

  4. Ian Lewis

    Hmm, that seems like it might be some kind of problem with our sub-class of Django's File class. Will need to take a look.

  5. rgcarrasqueira

    Hi I got a solution for this matter using the code bellow

    instance = MyObjectFromDb()
    
    def detect_delimiter(csv_file):
    
        header = csv_file.readline()
    
        if header.find(",") != -1:
            return ","
        elif header.find("\t") != -1:
            return "\t"
        else:
            return ";"
    
    
    def get_csv_reader(instance):
    
        instance.spreadsheet.seek(0)
    
        file_ = instance.spreadsheet
    
        if file_.name.endswith in ('.xls', '.xlsm', '.xlsx'):
            return None
    
        try:
            delimiter_char = detect_delimiter(file_)
            instance.spreadsheet.seek(0)
            content = instance.spreadsheet.read()
            csv_reader = csv.reader(content.splitlines(), dialect=csv.excel, delimiter=delimiter_char)
            return csv_reader
    
        except csv.Error:
            return None
    
    
    reader = get_csv_reader(instance)
    
    for row in reader:
         # do my read file
    
  6. Log in to comment