'ascii' codec can't decode byte 0xdb in position 942: ordinal not in range(128) SQLAlchemy

Issue #3659 resolved
Ramin Farajpour Cami created an issue

I use some Arabic script in my query and Python can't decode Arabic characters to ASCII and Python 2 uses bytestrings (ASCII) strings by default, which support only Latin characters. Python 3 uses Unicode strings by default.

query :

query = """SELECT * FROM (
               SELECT p.ID AS 'persons_ID', 
                 p.FirstName AS 'persons_FirstName', 
                 p.LastName AS 'persons_LastName',
                 p.NationalCode AS 'persons_NationalCode', 
                 p.CityID AS 'persons_CityID', 
                 p.Mobile AS 'persons_Mobile',
                 p.Address AS 'persons_Address', 
                 cities_1.ID AS 'cities_1_ID', 
                 cities_1.Name AS 'cities_1_Name',
                 cities_1.ParentID AS 'cities_1_ParentID', 
                 cities_2.ID AS 'cities_2_ID', 
                 cities_2.Name AS 'cities_2_Name',
                 cities_2.ParentID AS 'cities_2_ParentID' , 
                 cast(@row := @row + 1 as unsigned) as 'persons_row_number'                                                                              
               FROM Persons p 
               LEFT OUTER JOIN cities AS cities_2 
               ON cities_2.ID = p.CityID 
               LEFT OUTER JOIN cities AS cities_1 
               ON cities_1.ID = cities_2.ParentID , (
                 select @row := 0) as init WHERE 1=1
                 AND p.FirstName LIKE N'{}%'""".format('رامین')

i use format('رامین') i get error UnicodeEncodeError from MysqlDB,

Comments (6)

  1. Mike Bayer repo owner

    Hello -

    there are many programming errors here and one configurational error contributing towards the failure as well as some others:

    When using non-ASCII strings in Python 2, string literals must be specified as unicode literals:

    query = u"""SELECT * FROM (
                   ...
                     AND p.FirstName LIKE N'{}%'""".format(u'رامین')
    

    When using non-ascii characters with the Python-MySQL driver, a "charset" parameter must be passed to the engine URL; see the documentation at http://docs.sqlalchemy.org/en/rel_1_0/dialects/mysql.html#mysql-unicode for details:

    e = create_engine("mysql://scott:tiger@localhost/test?charset=utf8")
    

    Your SQL contains a literal percent sign. Depending on how this query is to be invoked, you may need to escape it as a double percent, because the driver is considering the string as including Python format specifiers:

    query = u"""SELECT * FROM (
                   ...
                     AND p.FirstName LIKE N'{}%%'""".format(u'رامین')
    

    without the above you'd get a "not enough arguments for format string" error from the driver.

    Embedding the literal parameter in the query itself is not the best practice here as it does not accommodate for formatting and in the general sense is a very significant security hazard. In general you should use a parameterized query:

    query = u"select * from some_table where value=%s"
    
    e.execute(query, (u'رامین', ))
    

    an example of showing the execution proceeding for a simpler query looks like:

    #!coding: utf8
    
    from sqlalchemy import create_engine
    
    e = create_engine("mysql://scott:tiger@localhost/test?charset=utf8")
    
    query = u"select %s"
    
    print e.execute(query, [u'رامین']).fetchall()
    
  2. Ramin Farajpour Cami reporter

    again i get error :

    from sqlalchemy import create_engine
    e = create_engine('mysql://root:@localhost/tutorial-ramin?charset=utf8')
    query = u"SELECT * FROM raminoooooooooo name=%s"
    e.execute(query, (u'رامین', ))
    # print e.execute(query, [u'love']).fetchall()
    query = u"""select * from raminoooooooooo where name=N'{}%%'""".format(u'رامین')
                            result = e.execute(query)
    

    Error :

    ProgrammingError: (_mysql_exceptions.ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '='\xd8\xb1\xd8\xa7\xd9\x85\xdb\x8c\xd9\x86'' at line 1") [SQL: u'SELECT * FROM raminoooooooooo name=%s'] [parameters: (u'\u0631\u0627\u0645\u06cc\u0646',)]
    
  3. Mike Bayer repo owner

    I'm not sure what that "N" syntax is, but I'm able to run that last query as long as the table exists:

    #!coding: utf8
    
    from sqlalchemy import create_engine
    
    e = create_engine("mysql://scott:tiger@localhost/test?charset=utf8")
    
    e.execute("create table if not exists raminoooooooooo (name varchar(20))")
    e.execute("insert into raminoooooooooo (name) values (%s)", [u'رامین'])
    
    query = u"""select * from raminoooooooooo where name like N'{}%%'""".format(u'رامین')
    print e.execute(query).fetchall()
    
    #!
    
    
     python test.py
    /home/classic/dev/sqlalchemy/lib/sqlalchemy/engine/default.py:462: Warning: Table 'raminoooooooooo' already exists
      cursor.execute(statement, parameters)
    [(u'\u0631\u0627\u0645\u06cc\u0646',)]
    
  4. Ramin Farajpour Cami reporter

    work now. but i see result to RowProxy how to acess to this data ?

    [(u'\u0631\u0627\u0645\u06cc\u0646',), (u'\u0631\u0627\u0645\u06cc\u0646',), (u'\u0631\u0627\u0645\u06cc\u0646',), (u'\u0631\u0627\u0645\u06cc\u0646',)]
    
  5. Ramin Farajpour Cami reporter

    big thanks @zzzeek , i use this code for access data,

           for v in result:
                   for column, value in v.items():
                      print "%s" % value.encode("utf-8")
    
  6. Log in to comment