Issues

Issue #80 new

Trying to parse csv file from django-storages causes error

Anonymous created an issue

I am trying to parse csv files that have been uploaded to Amazon S3 using django-storages. I keep getting a "Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?". The normal work around for this is to open the file with "rU", but that does not seem to work with django storages. If I drop the file directly on the server and open from there it works, I just want to avoid storing the files directly on the server if possible. Here is the code I am using:

{{{

!python

import csv from django.core.files.storage import default_storage as s3_storage n = 'csvdumps/130331548894.csv' csvf = s3_storage.open(n, "rU") csvReader = csv.reader(csvf) for item in csvReader: print item }}}

Comments (9)

  1. rgcarrasqueira

    Hi i'm trying to do the same thing, but also trying to get the 1st line to check if the column labels is there.

    I'm doing: header = reader.next()

    But I'm receiving a error about StopInteration Exception.

    Any Idea?

  2. rgcarrasqueira

    Hi Ian,

    Thanks for your reply. The problem is the csv file read using django-storages are read in a one single line, for example:

    'nome;e-mail\r\nale martins 1;aleze2004@hotmail.com\r\nale martins 2;alessandro@zemoleza.com.br\r\nale alves 1;alessandro@lvinterativa.com.br\r\nale alves 2;alessandro@xleads.com.br'

    The reading operation from s3 does not make distintion between line breaks. Any Ideas?

    Thanks!

    Rogério Carrasqueira

    Rogério Carrasqueira

  3. rgcarrasqueira

    Hi I got a solution for this matter using the code bellow

    instance = MyObjectFromDb()
    
    def detect_delimiter(csv_file):
    
        header = csv_file.readline()
    
        if header.find(",") != -1:
            return ","
        elif header.find("\t") != -1:
            return "\t"
        else:
            return ";"
    
    
    def get_csv_reader(instance):
    
        instance.spreadsheet.seek(0)
    
        file_ = instance.spreadsheet
    
        if file_.name.endswith in ('.xls', '.xlsm', '.xlsx'):
            return None
    
        try:
            delimiter_char = detect_delimiter(file_)
            instance.spreadsheet.seek(0)
            content = instance.spreadsheet.read()
            csv_reader = csv.reader(content.splitlines(), dialect=csv.excel, delimiter=delimiter_char)
            return csv_reader
    
        except csv.Error:
            return None
    
    
    reader = get_csv_reader(instance)
    
    for row in reader:
         # do my read file
    
  4. Log in to comment