UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128) File "/usr/lib/python2.7/codecs.py", line 369, in write.

Issue #267 open
Erin Browning
created an issue

While creating reports in the list, xlsx, and csv modules, the following error was thrown:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)
File "/usr/lib/python2.7/codecs.py", line 369, in write.

I'm trying to export a contacts list from the bing_linkedin_cache module. I have names in English, Chinese, Korean and Arabic.

As a sidenote, the HTML report does not fail with this error; it succeeds.

From what I can tell, it's failing on the following name:

Adodoración Faomá

I created a contacts table with just this name in it and that's where it fails.

Comments (7)

  1. Brian King

    Thanks for the details, @Erin Browning - Very cool of you to show the issue so clearly!

    I think there may be something I can do in the bing_linkedin_cache module to fix this. I'm updating that for the new APIs over Christmas break, and will look at this then, too.

  2. Erin Browning reporter

    Hi @Brian King, I've actually done a bit more research and a large part of the problem is python 2's use of ascii based strings instead of unicode (python 3 solves this). A talk that describes this problem is at: http://farmdev.com/talks/unicode/

    I created a temporary fix locally so I could export my results using the reporting/csv module.

        def module_run(self):
            filename = self.options['filename']
            # codecs module not used because the csv module converts to ascii
            with open(filename, 'w') as outfile:
                # build a list of table names
                table = self.options['table']
                rows = self.query('SELECT * FROM "%s" ORDER BY 1' % (table))
                cnt = 0
                for row in rows:
                    row = [x if x else '' for x in row]
                    if any(row):
                        cnt += 1
                        csvwriter = csv.writer(outfile, quoting=csv.QUOTE_ALL)
                row = [v.decode('utf-8') if isinstance(v, str) else v for v in row] #handles other encodings
                row = [s.encode("utf-8") for s in row]
                        csvwriter.writerow(row) #writes everything with escapes if [row]
            self.output('%d records added to \'%s\'.' % (cnt, filename))
    

    Another potential fix is using the unicodecsv library: https://github.com/jdunck/python-unicodecsv

    Both of these fixes don't correct the problem with any of the other reporting modules, just reporting/csv

  3. Tim Tomes repo owner

    I HIGHLY encourage using ./recon-web to do exporting, reporting, analysis, etc.. It solves all of these problems. The way Python 2 handles unicode is the bane of my existence. I believe I am handling everything properly on the framework side. @Brian King are you placing data directly in the database? Or using framework API calls? That could be the issue.

  4. Brian King

    I'm using framework APIs, @Tim Tomes. I thought the cause might have been my not paying attention to the content-type in the Bing responses, but the research that @Erin Browning did makes that doubtful.

    I'm going to make some time this week to look at this. And yes, I think #262 and #267 have the same root cause. I'll verify that this week, too.

  5. Log in to comment