ValueError: string data too large when using cx_Oracle and executemany

Issue #2561 resolved

Former user created an issue 2012-09-05

This is not really a bug in SQLAlchemy but I think SQLAlchemy could work around this (most easily by not using executemany for cx_Oracle).

There is a known problem in cx_Oracle about string buffers in combination with executemany: If the first row of a string/text column is smaller than 4000 bytes, cx_Oracle will allocate a vt_String variable to hold the column data. However, if a following row is above that limit, cx_Oracle will throw a ValueError("string data too large").

This is discussed here for example:

This appeared to be a regression in SQLAlchemy 0.7.x vs. 0.6.x because it seems like SQLAlchemy will now optimize inserts by using executemany. It is hard to trace this error, especially as the ORM will only use executemany (so it seems to me) if the primary key is already known as is the case for joined table inheritance mappers.

Here is an example using SA core how to trigger this problem:

import logging
from sqlalchemy import *

logging.basicConfig(level=logging.DEBUG)
logging.getLogger("sqlalchemy.engine").setLevel(logging.DEBUG)

engine = create_engine("oracle://lexikon:lexikon@oraclevm:1521/lexikon")
connection = engine.connect()

metadata = MetaData()

example_table = Table("example", metadata,
        Column("id", Integer, Sequence("id_seq"), primary_key=True),
        Column("info", UnicodeText))

example_table.create(connection, checkfirst=True)

values = [content" * 4000))
values.append(dict(info=u"short"))

# This works: First row has large data, second is smaller.
connection.execute(example_table.insert(), values)

# This doesn't ===> ValueError: string data too large, because initial row leads to small buffer.
values.reverse()
connection.execute(example_table.insert(), values)

example_table.drop(connection, checkfirst=True)

I actually ran into this using the ORM and was initially unable to reproduce. The following example does trigger the problem here, but relies on the insert order - I don't know if this it deterministic in SQLAlchemy.

import logging
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import *

logging.basicConfig(level=logging.DEBUG)
logging.getLogger("sqlalchemy.engine").setLevel(logging.DEBUG)

Base = declarative_base()

class ExampleBase(Base):
    __tablename__ = "example_base"
    id = Column(Integer, Sequence("id_seq"), primary_key=True)
    discriminator = Column('type', String(50))
    __mapper_args__ = {'polymorphic_on': discriminator}

class Example(ExampleBase):
    __tablename__ = "example"
    id = Column(Integer, ForeignKey(ExampleBase.id), primary_key=True)
    info = Column(UnicodeText)
    __mapper_args__ = {'polymorphic_identity': 'example'}


engine = create_engine("oracle://lexikon:lexikon@oraclevm:1521/lexikon")
Base.metadata.create_all(engine)

Session = sessionmaker(engine)
session = Session()

instances = [
    Example(info="short"),
    Example(info="less short " * 4000)
](]
values.append(dict(info=u"long)

for t in instances:
    session.add(t)

session.commit()

The latter example throws up because both rows to the child table are inserted using one executemany call.

I would like to have an option in SQLAlchemy to disable executemany and have SQLAlchemy dissect an executemany request into individual execute calls.

Comments (8)

Former user Account Deleted
If you are facing the same problem, you may want to apply this patch to cx_Oracle which seems to fix the problem for me:

https://github.com/Bluehorn/cx_Oracle/commit/db38a7aee5180b3549499f80e7899055a9321410

Not sure if this has any downside like increased memory consumption.
- 2012-09-05T14:54:36+00:00

Mike Bayer repo owner

first off, congrats on getting Anthony to respond to you. I'm not able to get a response from him on a different issue (~~#2469~~).

in this case the gmane thread seems to suggest that setinputsizes() should be clearing up the issue. We call setinputsizes for all types except STRING, UNICODE, NCLOB, CLOB. this has been like this for a long time as there were various issues observed, however I wonder if by now we can just either remove that altogether or at least remove NCLOB, CLOB from the excluded list. Can you try this against 0.7 please:

diff -r 90ad3c5e16ffe4f972144eebaf7a7e2a0397c9c0 lib/sqlalchemy/dialects/oracle/cx_oracle.py
--- a/lib/sqlalchemy/dialects/oracle/cx_oracle.py   Fri Aug 31 20:03:57 2012 -0400
+++ b/lib/sqlalchemy/dialects/oracle/cx_oracle.py   Wed Sep 05 17:18:11 2012 -0400
@@ -337,7 +337,7 @@
             # on String, including that outparams/RETURNING
             # breaks for varchars
             self.set_input_sizes(quoted_bind_names,
-                                 exclude_types=self.dialect._cx_oracle_string_types
+                                 exclude_types=self.dialect._cx_oracle_exclude_setinputsizes
                                 )

         # if a single execute, check for outparams
@@ -507,6 +507,7 @@
                         getattr(self.dbapi, name, None) for name in names
                     ]).difference([None](None))

+        self._cx_oracle_exclude_setinputsizes = types("STRING", "UNICODE")
         self._cx_oracle_string_types = types("STRING", "UNICODE", "NCLOB", "CLOB")
         self._cx_oracle_unicode_types = types("UNICODE", "NCLOB")
         self._cx_oracle_binary_types = types("BFILE", "CLOB", "NCLOB", "BLOB")

2012-09-05T17:18:25+00:00

Mike Bayer repo owner
- marked as major
- changed milestone to 0.7.9
- marked as bug
- 2012-09-05T17:18:53+00:00
Mike Bayer repo owner
I'm able to get tests to pass when NCLOB and CLOB are allowed through setinputsizes(), so I've removed them from the default list and also made the "exclude" list configurable in 0.8, this is in d3e60f805d573206f68c0eb7e7187c5a1c54a731. then again I'm not sure I have CLOB/NCLOB covered enough to know.
- 2012-09-05T18:13:17+00:00
Former user Account Deleted
Replying to zzzeek:

first off, congrats on getting Anthony to respond to you. I'm not able to get a response from him on a different issue (~~#2469~~).

Eh, where did I say he did respond to me? The linked gmane thread was not started by me and is from 2010.
in this case the gmane thread seems to suggest that setinputsizes() should be clearing up the issue. We call setinputsizes for all types except STRING, UNICODE, NCLOB, CLOB. this has been like this for a long time as there were various issues observed, however I wonder if by now we can just either remove that altogether or at least remove NCLOB, CLOB from the excluded list. Can you try this against 0.7 please:

{{{ #!diff diff -r 90ad3c5e16ffe4f972144eebaf7a7e2a0397c9c0 lib/sqlalchemy/dialects/oracle/cx_oracle.py --- a/lib/sqlalchemy/dialects/oracle/cx_oracle.py Fri Aug 31 20:03:57 2012 -0400 +++ b/lib/sqlalchemy/dialects/oracle/cx_oracle.py Wed Sep 05 17:18:11 2012 -0400 @@ -337,7 +337,7 @@ # on String, including that outparams/RETURNING # breaks for varchars self.set_input_sizes(quoted_bind_names, - exclude_types=self.dialect._cx_oracle_string_types + exclude_types=self.dialect._cx_oracle_exclude_setinputsizes )
# if a single execute, check for outparams
@@ -507,6 +507,7 @@ getattr(self.dbapi, name, None) for name in names ]).difference(None)
- self._cx_oracle_exclude_setinputsizes = types("STRING", "UNICODE") self._cx_oracle_string_types = types("STRING", "UNICODE", "NCLOB", "CLOB") self._cx_oracle_unicode_types = types("UNICODE", "NCLOB") self._cx_oracle_binary_types = types("BFILE", "CLOB", "NCLOB", "BLOB") }}}
I would have tried earlier but I can only run this on the continuous integration system and I did not want to break our deployment for the sake of testing that patch.

With the patch applied, the above sample scripts both pass (and they fail without the patch). Thank you!
- 2012-10-05T06:04:21+00:00
Mike Bayer repo owner
- marked as critical
- 2012-10-05T10:34:09+00:00
Mike Bayer repo owner
- changed status to resolved
3fbbecbf16a385f46674fb41db25ab58844dcc2b
- 2012-10-10T13:27:16+00:00
Mike Bayer repo owner
- removed milestone
Removing milestone: 0.7.10 (automated comment)
- 2014-02-23T18:14:47+00:00
Log in to comment

Assignee: Mike Bayer

Type: bug

Priority: critical

Status: resolved

Component: oracle

Milestone: –

Votes: 0

Watchers: 0