ValueError: string data too large when using cx_Oracle and executemany

Issue #2561 resolved
Former user created an issue

This is not really a bug in SQLAlchemy but I think SQLAlchemy could work around this (most easily by not using executemany for cx_Oracle).

There is a known problem in cx_Oracle about string buffers in combination with executemany: If the first row of a string/text column is smaller than 4000 bytes, cx_Oracle will allocate a vt_String variable to hold the column data. However, if a following row is above that limit, cx_Oracle will throw a ValueError("string data too large").

This is discussed here for example:

This appeared to be a regression in SQLAlchemy 0.7.x vs. 0.6.x because it seems like SQLAlchemy will now optimize inserts by using executemany. It is hard to trace this error, especially as the ORM will only use executemany (so it seems to me) if the primary key is already known as is the case for joined table inheritance mappers.

Here is an example using SA core how to trigger this problem:

import logging
from sqlalchemy import *

logging.basicConfig(level=logging.DEBUG)
logging.getLogger("sqlalchemy.engine").setLevel(logging.DEBUG)

engine = create_engine("oracle://lexikon:lexikon@oraclevm:1521/lexikon")
connection = engine.connect()

metadata = MetaData()

example_table = Table("example", metadata,
        Column("id", Integer, Sequence("id_seq"), primary_key=True),
        Column("info", UnicodeText))

example_table.create(connection, checkfirst=True)

values = [content" * 4000))
values.append(dict(info=u"short"))

# This works: First row has large data, second is smaller.
connection.execute(example_table.insert(), values)

# This doesn't ===> ValueError: string data too large, because initial row leads to small buffer.
values.reverse()
connection.execute(example_table.insert(), values)

example_table.drop(connection, checkfirst=True)

I actually ran into this using the ORM and was initially unable to reproduce. The following example does trigger the problem here, but relies on the insert order - I don't know if this it deterministic in SQLAlchemy.

import logging
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import *

logging.basicConfig(level=logging.DEBUG)
logging.getLogger("sqlalchemy.engine").setLevel(logging.DEBUG)

Base = declarative_base()

class ExampleBase(Base):
    __tablename__ = "example_base"
    id = Column(Integer, Sequence("id_seq"), primary_key=True)
    discriminator = Column('type', String(50))
    __mapper_args__ = {'polymorphic_on': discriminator}

class Example(ExampleBase):
    __tablename__ = "example"
    id = Column(Integer, ForeignKey(ExampleBase.id), primary_key=True)
    info = Column(UnicodeText)
    __mapper_args__ = {'polymorphic_identity': 'example'}


engine = create_engine("oracle://lexikon:lexikon@oraclevm:1521/lexikon")
Base.metadata.create_all(engine)

Session = sessionmaker(engine)
session = Session()

instances = [
    Example(info="short"),
    Example(info="less short " * 4000)
](]
values.append(dict(info=u"long)

for t in instances:
    session.add(t)

session.commit()

The latter example throws up because both rows to the child table are inserted using one executemany call.

I would like to have an option in SQLAlchemy to disable executemany and have SQLAlchemy dissect an executemany request into individual execute calls.

Comments (8)

  1. Mike Bayer repo owner

    first off, congrats on getting Anthony to respond to you. I'm not able to get a response from him on a different issue (#2469).

    in this case the gmane thread seems to suggest that setinputsizes() should be clearing up the issue. We call setinputsizes for all types except STRING, UNICODE, NCLOB, CLOB. this has been like this for a long time as there were various issues observed, however I wonder if by now we can just either remove that altogether or at least remove NCLOB, CLOB from the excluded list. Can you try this against 0.7 please:

    diff -r 90ad3c5e16ffe4f972144eebaf7a7e2a0397c9c0 lib/sqlalchemy/dialects/oracle/cx_oracle.py
    --- a/lib/sqlalchemy/dialects/oracle/cx_oracle.py   Fri Aug 31 20:03:57 2012 -0400
    +++ b/lib/sqlalchemy/dialects/oracle/cx_oracle.py   Wed Sep 05 17:18:11 2012 -0400
    @@ -337,7 +337,7 @@
                 # on String, including that outparams/RETURNING
                 # breaks for varchars
                 self.set_input_sizes(quoted_bind_names,
    -                                 exclude_types=self.dialect._cx_oracle_string_types
    +                                 exclude_types=self.dialect._cx_oracle_exclude_setinputsizes
                                     )
    
             # if a single execute, check for outparams
    @@ -507,6 +507,7 @@
                             getattr(self.dbapi, name, None) for name in names
                         ]).difference([None](None))
    
    +        self._cx_oracle_exclude_setinputsizes = types("STRING", "UNICODE")
             self._cx_oracle_string_types = types("STRING", "UNICODE", "NCLOB", "CLOB")
             self._cx_oracle_unicode_types = types("UNICODE", "NCLOB")
             self._cx_oracle_binary_types = types("BFILE", "CLOB", "NCLOB", "BLOB")
    
  2. Mike Bayer repo owner

    I'm able to get tests to pass when NCLOB and CLOB are allowed through setinputsizes(), so I've removed them from the default list and also made the "exclude" list configurable in 0.8, this is in d3e60f805d573206f68c0eb7e7187c5a1c54a731. then again I'm not sure I have CLOB/NCLOB covered enough to know.

  3. Former user Account Deleted

    Replying to zzzeek:

    first off, congrats on getting Anthony to respond to you. I'm not able to get a response from him on a different issue (#2469).

    Eh, where did I say he did respond to me? The linked gmane thread was not started by me and is from 2010.

    in this case the gmane thread seems to suggest that setinputsizes() should be clearing up the issue. We call setinputsizes for all types except STRING, UNICODE, NCLOB, CLOB. this has been like this for a long time as there were various issues observed, however I wonder if by now we can just either remove that altogether or at least remove NCLOB, CLOB from the excluded list. Can you try this against 0.7 please:

    {{{ #!diff diff -r 90ad3c5e16ffe4f972144eebaf7a7e2a0397c9c0 lib/sqlalchemy/dialects/oracle/cx_oracle.py --- a/lib/sqlalchemy/dialects/oracle/cx_oracle.py Fri Aug 31 20:03:57 2012 -0400 +++ b/lib/sqlalchemy/dialects/oracle/cx_oracle.py Wed Sep 05 17:18:11 2012 -0400 @@ -337,7 +337,7 @@ # on String, including that outparams/RETURNING # breaks for varchars self.set_input_sizes(quoted_bind_names, - exclude_types=self.dialect._cx_oracle_string_types + exclude_types=self.dialect._cx_oracle_exclude_setinputsizes )

         # if a single execute, check for outparams
    

    @@ -507,6 +507,7 @@ getattr(self.dbapi, name, None) for name in names ]).difference(None)

    • self._cx_oracle_exclude_setinputsizes = types("STRING", "UNICODE") self._cx_oracle_string_types = types("STRING", "UNICODE", "NCLOB", "CLOB") self._cx_oracle_unicode_types = types("UNICODE", "NCLOB") self._cx_oracle_binary_types = types("BFILE", "CLOB", "NCLOB", "BLOB") }}}

    I would have tried earlier but I can only run this on the continuous integration system and I did not want to break our deployment for the sake of testing that patch.

    With the patch applied, the above sample scripts both pass (and they fail without the patch). Thank you!

  4. Log in to comment