LIMIT for Oracle should render as ROWNUM <= [text] instead of ROWNUM <= :bindparam

Issue #2116 resolved
Former user created an issue

Our limit queries were taking far longer than they should have been for a production database using Oracle 9i.

After much troubleshooting, I determined that the source of the problem is the "ROWNUM <= :parameter" clause that is rendered for Oracle's implementation of "LIMIT".

I took the sqlalchemy rendered query and plugged it into SQLPLUS, using bind parameters.

My query took consistently over 40 seconds.

There were several bind parameters. Replacing the single ROWNUM limit parameter with a literal of the same value consistently cut my query to about 4 seconds. ''I did reset oracle's cache between runs.''

This also changed Oracle's execution plan itself, along with these numbers:

With bind parameter: Cost over 5 million.

With literal text: Cost of 57,000

(I recognize that the numbers are relative only between different execution strategies for the same query, but I've given them anyway to give an idea of the order of magnitude difference.)

Oracle uses a STOPKEY mechanism to optimize ordered rownum limited queries. My hypothesis is that if this rownum is limited by a bind parameter, oracle's engine must query all matching records regardless of the presence of the limiting ordered stopkey, while given an actual value, it can only then limit the records it fetches.

It is feasible that oracle 10 or higher has fixed this problem, so you may decide this ticket is ultimately a recipe, but the evidence I'm seeing would convince me to seriously consider this as an improvement for sqlalchemy.

If this is not something for the project, can you point me in the rough direction of fixing this for my project?

Thanks

Comments (17)

  1. Mike Bayer repo owner

    so here's a patch, if you want to write tests for me, mainly one or two in test_oracle:CompileTest

    diff -r eb4a843318b4fa76d238c43a293d64af64ca1148 lib/sqlalchemy/dialects/oracle/base.py
    --- a/lib/sqlalchemy/dialects/oracle/base.py    Wed Mar 30 14:22:41 2011 -0400
    +++ b/lib/sqlalchemy/dialects/oracle/base.py    Thu Mar 31 11:56:35 2011 -0400
    @@ -524,6 +524,8 @@
                         max_row = select._limit
                         if select._offset is not None:
                             max_row += select._offset
    +                    if not self.dialect.use_binds_for_limits:
    +                        max_row = literal_column("%d" % max_row)
                         limitselect.append_whereclause(
                                 sql.literal_column("ROWNUM")<=max_row)
    
    @@ -542,8 +544,11 @@
                         offsetselect._oracle_visit = True
                         offsetselect._is_wrapper = True
    
    +                    offset_value = select._offset
    +                    if not self.dialect.use_binds_for_limits:
    +                        offset_value = literal_column("%d" % offset_value)
                         offsetselect.append_whereclause(
    -                             sql.literal_column("ora_rn")>select._offset)
    +                             sql.literal_column("ora_rn")>offset_value)
    
                         offsetselect.for_update = select.for_update
                         select = offsetselect
    @@ -635,10 +640,12 @@
         def __init__(self, 
                     use_ansi=True, 
                     optimize_limits=False, 
    +                use_binds_for_limits=True,
                     **kwargs):
             default.DefaultDialect.__init__(self, **kwargs)
             self.use_ansi = use_ansi
             self.optimize_limits = optimize_limits
    +        self.use_binds_for_limits = use_binds_for_limits
    
         def initialize(self, connection):
             super(OracleDialect, self).initialize(connection)
    
  2. Former user Account Deleted

    Replying to zzzeek:

    so here's a patch, if you want to write tests for me, mainly one or two in test_oracle:CompileTest

    Each time I write tests I wish I could actually run them. Do you have instructions for how to run these tests?

  3. Former user Account Deleted

    Few items: * we needed sql.literal_column instead of literal_column * at least for my case, max_row was sometimes unicode, depending on the source of the limit, so I needed "%s" instead of "%d" * am I using limit incorrectly? It apparently works with unicode numbers, but I could see problems with offset, so what is best choice? * Whatever the answer is may affect the oracle hint several lines earlier ("/*+ FIRST_ROWS(%d) */" % select._limit)

    --- SQLAlchemy-0.6.4.3kb/lib/sqlalchemy/dialects/oracle/base.py 2010-10-20 19:47:35.000000000 -0400
    +++ SQLAlchemy-0.6.4.4kb/lib/sqlalchemy/dialects/oracle/base.py 2011-03-31 13:06:25.000000000 -0400
    @@ -515,40 +515,45 @@
                         limitselect = limitselect.prefix_with("/*+ FIRST_ROWS(%d) */" % select._limit)
    
                     limitselect._oracle_visit = True
                     limitselect._is_wrapper = True
    
                     # If needed, add the limiting clause
                     if select._limit is not None:
                         max_row = select._limit
                         if select._offset is not None:
                             max_row += select._offset
    +                    if not self.dialect.use_binds_for_limits:
    +                        max_row = sql.literal_column("%s" % max_row)
                         limitselect.append_whereclause(
                                 sql.literal_column("ROWNUM")<=max_row)
    
                     # If needed, add the ora_rn, and wrap again with offset.
                     if select._offset is None:
                         limitselect.for_update = select.for_update
                         select = limitselect
                     else:
                         limitselect = limitselect.column(
                                  sql.literal_column("ROWNUM").label("ora_rn"))
                         limitselect._oracle_visit = True
                         limitselect._is_wrapper = True
    
                         offsetselect = sql.select(
                                  [for c in limitselect.c if c.key!='ora_rn'](c))
                         offsetselect._oracle_visit = True
                         offsetselect._is_wrapper = True
    
    +                    offset_value = select._offset
    +                    if not self.dialect.use_binds_for_limits:
    +                        offset_value = sql.literal_column("%s" % offset_value)
                         offsetselect.append_whereclause(
    -                             sql.literal_column("ora_rn")>select._offset)
    +                             sql.literal_column("ora_rn")>offset_value)
    
                         offsetselect.for_update = select.for_update
                         select = offsetselect
    
             kwargs['iswrapper']('iswrapper') = getattr(select, '_is_wrapper', False)
             return compiler.SQLCompiler.visit_select(self, select, **kwargs)
    
         def limit_clause(self, select):
             return ""
    
    @@ -626,24 +631,26 @@
         ddl_compiler = OracleDDLCompiler
         type_compiler = OracleTypeCompiler
         preparer = OracleIdentifierPreparer
         execution_ctx_cls = OracleExecutionContext
    
         reflection_options = ('oracle_resolve_synonyms', )
    
         def __init__(self,
                     use_ansi=True,
                     optimize_limits=False,
    +                use_binds_for_limits=True,
                     **kwargs):
             default.DefaultDialect.__init__(self, **kwargs)
             self.use_ansi = use_ansi
             self.optimize_limits = optimize_limits
    +        self.use_binds_for_limits = use_binds_for_limits
    
         def initialize(self, connection):
             super(OracleDialect, self).initialize(connection)
             self.implicit_returning = self.__dict__.get(
                                     'implicit_returning',
                                     self.server_version_info > (10, )
                                 )
    
             if self._is_oracle_8:
                 self.colspecs = self.colspecs.copy()
    
  4. Former user Account Deleted

    So my real question is: * Do we cater to people who might put unicode/string values for limit/offset and make sqlalchemy convert them or do we say "sorry, you need to use some form of an integer" ?

  5. Mike Bayer repo owner

    the use case is, someone says limit("5") ? how is that happening ? is that on your end ? ( no its not valid)

  6. Former user Account Deleted

    Replying to zzzeek:

    the use case is, someone says limit("5") ? how is that happening ? is that on your end ? ( no its not valid)

    Yes, that is effectively what is happening, but more subtle, since it is ultimately being passed as a http parameter so it isn't obvious it isn't a number. I'll fix my code.

    I'll put it back to "%d".

    DBSession.query(Order).limit('10').offset('15').all()
    

    Happily does max_row += select._offset:

    WHERE ROWNUM <= 1015)
    WHERE ora_rn > 15) anon_1
    

    Is it something that warrants an exception being thrown?

    I'll also get a couple test cases.

  7. Former user Account Deleted

    This is a tangent, but related: The comment for this code is:

    # See http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html
    #
    # Generalized form of an Oracle pagination query:
    #   select ... from (
    #     select /*+ FIRST_ROWS(N) */ ...., rownum as ora_rn from (
    #         select distinct ... where ... order by ...
    #     ) where ROWNUM <= :limit+:offset
    #   ) where ora_rn > :offset
    # Outer select and "ROWNUM as ora_rn" can be dropped if limit=0
    

    Notice the "distinct" in the comment. This is something I've programmatically added when the limit also contains a join. If you don't then your limit is not a limit of entities retrieved but rather a limit of joined rows fetched (which is obviously not what is intended).
    I had felt sqlalchemy's engine should be adding the distinct, does this warrant another trac ticket?

    (I feel the answer forthcoming: "Sure, Kent, do you want to implement it...")

  8. Mike Bayer repo owner

    I don't know that the distinct is part of the general contract to simulate what LIMIT/OFFSET does. LIMIT OFFSET should be applying a window to the full rowset that the query would return in any case. If we are getting dupe ROWNUM values (didn't think it worked that way), then maybe.

  9. Mike Bayer repo owner

    also yes limit()/offset(), on the select() construct (as well as in its constructor) in theory should raise for a non-integer.

  10. Former user Account Deleted

    Here is my inclusive patch. (Sorry the diff is from 0.6.4, but I imagine no conflicts)

    I've also included functionality for offset() and limit() to raise for non-integer(like) parameters (the util.integer_or_raise will hopefully be helpful elsewhere):

    diff -U10 -r SQLAlchemy-0.6.4.3kb/lib/sqlalchemy/dialects/oracle/base.py SQLAlchemy-0.6.4.4kb/lib/sqlalchemy/dialects/oracle/base.py
    --- SQLAlchemy-0.6.4.3kb/lib/sqlalchemy/dialects/oracle/base.py 2010-10-20 19:47:35.000000000 -0400
    +++ SQLAlchemy-0.6.4.4kb/lib/sqlalchemy/dialects/oracle/base.py 2011-03-31 14:12:13.000000000 -0400
    @@ -515,40 +515,45 @@
                         limitselect = limitselect.prefix_with("/*+ FIRST_ROWS(%d) */" % select._limit)
    
                     limitselect._oracle_visit = True
                     limitselect._is_wrapper = True
    
                     # If needed, add the limiting clause
                     if select._limit is not None:
                         max_row = select._limit
                         if select._offset is not None:
                             max_row += select._offset
    +                    if not self.dialect.use_binds_for_limits:
    +                        max_row = sql.literal_column("%d" % max_row)
                         limitselect.append_whereclause(
                                 sql.literal_column("ROWNUM")<=max_row)
    
                     # If needed, add the ora_rn, and wrap again with offset.
                     if select._offset is None:
                         limitselect.for_update = select.for_update
                         select = limitselect
                     else:
                         limitselect = limitselect.column(
                                  sql.literal_column("ROWNUM").label("ora_rn"))
                         limitselect._oracle_visit = True
                         limitselect._is_wrapper = True
    
                         offsetselect = sql.select(
                                  [for c in limitselect.c if c.key!='ora_rn'](c))
                         offsetselect._oracle_visit = True
                         offsetselect._is_wrapper = True
    
    +                    offset_value = select._offset
    +                    if not self.dialect.use_binds_for_limits:
    +                        offset_value = sql.literal_column("%d" % offset_value)
                         offsetselect.append_whereclause(
    -                             sql.literal_column("ora_rn")>select._offset)
    +                             sql.literal_column("ora_rn")>offset_value)
    
                         offsetselect.for_update = select.for_update
                         select = offsetselect
    
             kwargs['iswrapper']('iswrapper') = getattr(select, '_is_wrapper', False)
             return compiler.SQLCompiler.visit_select(self, select, **kwargs)
    
         def limit_clause(self, select):
             return ""
    
    @@ -626,24 +631,26 @@
         ddl_compiler = OracleDDLCompiler
         type_compiler = OracleTypeCompiler
         preparer = OracleIdentifierPreparer
         execution_ctx_cls = OracleExecutionContext
    
         reflection_options = ('oracle_resolve_synonyms', )
    
         def __init__(self,
                     use_ansi=True,
                     optimize_limits=False,
    +                use_binds_for_limits=True,
                     **kwargs):
             default.DefaultDialect.__init__(self, **kwargs)
             self.use_ansi = use_ansi
             self.optimize_limits = optimize_limits
    +        self.use_binds_for_limits = use_binds_for_limits
    
         def initialize(self, connection):
             super(OracleDialect, self).initialize(connection)
             self.implicit_returning = self.__dict__.get(
                                     'implicit_returning',
                                     self.server_version_info > (10, )
                                 )
    
             if self._is_oracle_8:
                 self.colspecs = self.colspecs.copy()
    diff -U10 -r SQLAlchemy-0.6.4.3kb/lib/sqlalchemy/sql/expression.py SQLAlchemy-0.6.4.4kb/lib/sqlalchemy/sql/expression.py
    --- SQLAlchemy-0.6.4.3kb/lib/sqlalchemy/sql/expression.py       2010-09-13 02:39:39.000000000 -0400
    +++ SQLAlchemy-0.6.4.4kb/lib/sqlalchemy/sql/expression.py       2011-03-31 15:33:29.000000000 -0400
    @@ -3539,22 +3539,22 @@
                 autocommit=None):
             self.use_labels = use_labels
             self.for_update = for_update
             if autocommit is not None:
                 util.warn_deprecated('autocommit on select() is '
                                      'deprecated.  Use .execution_options(a'
                                      'utocommit=True)')
                 self._execution_options = \
                     self._execution_options.union({'autocommit'
                         : autocommit})
    -        self._limit = limit
    -        self._offset = offset
    +        self._limit = util.integer_or_raise(limit, 'limit', False)
    +        self._offset = util.integer_or_raise(offset, 'offset', False)
             self._bind = bind
    
             self._order_by_clause = ClauseList(*util.to_list(order_by) or [        self._group_by_clause = ClauseList(*util.to_list(group_by) or [](])
    ))
    
         def as_scalar(self):
             """return a 'scalar' representation of this selectable, which can be
             used as a column expression.
    
             Typically, a select statement which has only one column in its columns
    @@ -3607,28 +3607,28 @@
             s = self.__class__.__new__(self.__class__)
             s.__dict__ = self.__dict__.copy()
             s._reset_exported()
             return s
    
         @_generative
         def limit(self, limit):
             """return a new selectable with the given LIMIT criterion
             applied."""
    
    -        self._limit = limit
    +        self._limit = util.integer_or_raise(limit, 'limit', False)
    
         @_generative
         def offset(self, offset):
             """return a new selectable with the given OFFSET criterion
             applied."""
    
    -        self._offset = offset
    +        self._offset = util.integer_or_raise(offset, 'offset', False)
    
         @_generative
         def order_by(self, *clauses):
             """return a new selectable with the given list of ORDER BY
             criterion applied.
    
             The criterion will be appended to any pre-existing ORDER BY
             criterion.
    
             """
    diff -U10 -r SQLAlchemy-0.6.4.3kb/lib/sqlalchemy/util.py SQLAlchemy-0.6.4.4kb/lib/sqlalchemy/util.py
    --- SQLAlchemy-0.6.4.3kb/lib/sqlalchemy/util.py 2010-09-13 02:39:39.000000000 -0400
    +++ SQLAlchemy-0.6.4.4kb/lib/sqlalchemy/util.py 2011-03-31 15:29:37.000000000 -0400
    @@ -589,20 +589,36 @@
         necessary.  If 'flexi_bool' is True, the string '0' is considered false
         when coercing to boolean.
         """
    
         if key in kw and type(kw[key](key)) is not type_ and kw[key](key) is not None:
             if type_ is bool and flexi_bool:
                 kw[key](key) = asbool(kw[key](key))
             else:
                 kw[key](key) = type_(kw[key](key))
    
    +def integer_or_raise(value, attribute, allow_negative=True):
    +    """ensure the value is integer like or raise TypeError with attribute as name"""
    +    if value is None:
    +        return value
    +    try:
    +        value + 0
    +    except TypeError:
    +        pass
    +    else:
    +        if int(value) == value:
    +            if allow_negative or value >= 0:
    +                return value
    +            else:
    +                raise ValueError("'%s' may not be negative" % attribute)
    +    raise TypeError("'%s' must be %s integer or None" % (attribute, allow_negative and 'an' or 'a non negative'))
    +
     def duck_type_collection(specimen, default=None):
         """Given an instance or class, guess if it is or is acting as one of
         the basic collection types: list, set and dict.  If the __emulates__
         property is present, return that preferentially.
         """
    
         if hasattr(specimen, '__emulates__'):
             # canonicalize set vs sets.Set to a standard: the builtin set
             if (specimen.__emulates__ is not None and
                 issubclass(specimen.__emulates__, set_types)):
    Only in SQLAlchemy-0.6.4.4kb/lib: SQLAlchemy.egg-info
    diff -U10 -r SQLAlchemy-0.6.4.3kb/test/dialect/test_oracle.py SQLAlchemy-0.6.4.4kb/test/dialect/test_oracle.py
    --- SQLAlchemy-0.6.4.3kb/test/dialect/test_oracle.py    2010-09-13 02:39:39.000000000 -0400
    +++ SQLAlchemy-0.6.4.4kb/test/dialect/test_oracle.py    2011-04-05 19:07:44.000000000 -0400
    @@ -124,21 +124,47 @@
             s = select([t](t),
                        for_update=True).limit(10).offset(20).order_by(t.c.col2)
             self.assert_compile(s,
                                 'SELECT col1, col2 FROM (SELECT col1, '
                                 'col2, ROWNUM AS ora_rn FROM (SELECT '
                                 'sometable.col1 AS col1, sometable.col2 AS '
                                 'col2 FROM sometable ORDER BY '
                                 'sometable.col2) WHERE ROWNUM <= '
                                 ':ROWNUM_1) WHERE ora_rn > :ora_rn_1 FOR '
                                 'UPDATE')
    -
    +
    +    def test_use_binds_for_limits(self):
    +        t = table('sometable', column('col1'), column('col2'))
    +        dialect = oracle.OracleDialect()
    +        dialect.use_binds_for_limits = False
    +        s = select([t](t)).limit(10)
    +        self.assert_compile(s,
    +                            'SELECT col1, col2 '
    +                            'FROM (SELECT sometable.col1 AS col1, '
    +                            'sometable.col2 AS '
    +                            'col2 FROM sometable) WHERE ROWNUM <= 10',
    +                            dialect=dialect)
    +        for s in (select([t](t)).limit(Decimal('10')).offset(20),
    +                    select([t](t)).limit(10.00).offset(20.0)):
    +            self.assert_compile(s,
    +                            'SELECT col1, col2 FROM (SELECT col1, '
    +                            'col2, ROWNUM AS ora_rn FROM (SELECT '
    +                            'sometable.col1 AS col1, sometable.col2 AS '
    +                            'col2 FROM sometable) WHERE ROWNUM <= '
    +                            '30) WHERE ora_rn > 20',
    +                            dialect=dialect)
    +        assert_raises(TypeError,select([t](t)).limit, u'55')
    +        assert_raises(TypeError,select([t](t)).offset, '55')
    +        assert_raises(TypeError,select([t](t)).limit, 55.01)
    +        assert_raises(TypeError,select([t](t)).offset, -55.01)
    +        assert_raises(ValueError,select([t](t)).limit, -55)
    +        assert_raises(ValueError,select([t](t)).offset, -5)
    
         def test_long_labels(self):
             dialect = default.DefaultDialect()
             dialect.max_identifier_length = 30
    
             ora_dialect = oracle.dialect()
    
             m = MetaData()
             a_table = Table(
                 'thirty_characters_table_xxxxxx',
    
  11. Mike Bayer repo owner

    OK the trick here is that we can't put the integer_or_raise thing across the board in 0.6 since it will break existing apps. ill try to split this out among 0.6/0.7.

  12. Log in to comment