Several small issues with Oracle recursive queries
Hi Mike, I'm trying to use the same CTE I have working for PG with Oracle (since Oracle now supports CTE), however there are several small issues that prevents the use of SQLAlchemy's CTE with Oracle as transparently as it is possible with PG (I need to make it work with SQL Server too but so far I have not tested it).
1- The first issue is regarding the RECURSIVE keyword. Here is a simple script that includes everything required to reproduce the problem.
from sqlalchemy.orm import aliased
from sqlalchemy import String, Integer, Column, func, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
engine = create_engine('oracle://scott:tiger@localhost:1521/xe', echo=True)
Base = declarative_base()
class Part(Base):
__tablename__ = 'part'
part = Column(String(200), primary_key=True)
sub_part = Column(String(200), primary_key=True)
quantity = Column(Integer)
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
included_parts = session.query(
Part.sub_part,
Part.part,
Part.quantity).\
filter(Part.part=="our part").\
cte(name="included_parts", recursive=True)
incl_alias = aliased(included_parts, name="pr1")
parts_alias = aliased(Part, name="p")
included_parts = included_parts.union_all(
session.query(
parts_alias.sub_part,
parts_alias.part,
parts_alias.quantity).\
filter(parts_alias.part==included_parts.c.sub_part)
)
q = session.query(
included_parts.c.sub_part,
func.sum(included_parts.c.quantity).
label('total_quantity')
).\
group_by(included_parts.c.sub_part)
session.execute(q)
if you try to execute this version, you will get the following error (because Oracle does not like the word RECURSIVE):
sqlalchemy.exc.DatabaseError: (DatabaseError) ORA-00905: missing keyword
'WITH RECURSIVE included_parts(sub_part, part, quantity) AS \n(SELECT part.sub_part AS sub_part, part.part AS part, part.quantity AS quantity \nFROM part \nWHERE part.part = :part_1 UNION ALL SELECT p.sub_part AS p_sub_part, p.part AS p_part, p.quantity AS p_quantity \nFROM part p, included_parts \nWHERE p.part = included_parts.sub_part)\n SELECT included_parts.sub_part AS included_parts_sub_part, sum(included_parts.quantity) AS total_quantity \nFROM included_parts GROUP BY included_parts.sub_part' {'part_1': 'our part'}
so, of course, we change the CTE (even if we are doing a recursive query) from
included_parts = session.query(
Part.sub_part,
Part.part,
Part.quantity).\
filter(Part.part=="our part").\
cte(name="included_parts", recursive=True)
to
included_parts = session.query(
Part.sub_part,
Part.part,
Part.quantity).\
filter(Part.part=="our part").\
cte(name="included_parts", recursive=False)
but now we get
sqlalchemy.exc.DatabaseError: (DatabaseError) ORA-32039: recursive WITH clause must have column alias list
'WITH included_parts AS \n(SELECT part.sub_part AS sub_part, part.part AS part, part.quantity AS quantity \nFROM part \nWHERE part.part = :part_1 UNION ALL SELECT p.sub_part AS p_sub_part, p.part AS p_part, p.quantity AS p_quantity \nFROM part p, included_parts \nWHERE p.part = included_parts.sub_part)\n SELECT included_parts.sub_part AS included_parts_sub_part, sum(included_parts.quantity) AS total_quantity \nFROM included_parts GROUP BY included_parts.sub_part' {'part_1': 'our part'}
because we are now missing the list of columns required.
2- the second issue deals with some special keywords oracle has in order to provide ordering and to prevent cycle issues within a recursive query. The complete spec is available here and I found these two examples floating around. The cycle option is specially useful since it can prevent circular errors in the recursive relation. In PG you can do some magic with arrays, rows and any but in Oracle it is really hard to prevent circular issues unless you use the suggested features.
To tell you the thruth it looks like a pretty troublesome extension to add to sqlalchemy but I guess it could be great if at least we can input some text() condition with this .
Thanks for your patience reading this and I am available for anything you need me to test in case you think this issue is worth your time.
Mariano
Comments (44)
-
repo owner -
reporter I will test your suggestion right away. I will let you know how it was shortly.
Regarding the second part please do note that I am taking about clauses attached to the queries that use the WITH idiom and not the old hierarchical queries. Unless I am mistaken you are talking about options such as SYS_CONNECT_BY_PATH and related stuff. See the following example I extracted from those blogs:
with r(lvl,empno,ename,mgr,path) as ( select 1 as lvl, empno, ename, mgr,'/'||ename as path from emp where empno = 7839 union all select lvl+1, e.empno, e.ename, e.mgr,r.path||'/'||e.ename from emp e, r where e.mgr = r.empno ) search depth first by ename set ord1 cycle empno set y_cycle to 1 default 0 select lvl,empno,ename,mgr,path,y_cycle from r ;
The final lines of the query include the following idioms:
search depth first by ename set ord1
and
cycle empno set y_cycle to 1 default 0
those are the ones I am referring to and since they both are between the recursive query and the actual final query, I thought it was possible to inject them via some extra parameter at the end of the with part.
-
reporter I tested the patch for the first issue and it works now. Here's the output it produces now:
2014-10-03 16:54:34,346 INFO sqlalchemy.engine.base.Engine WITH included_parts(sub_part, part, quantity) AS (SELECT part.sub_part AS sub_part, part.part AS part, part.quantity AS quantity FROM part WHERE part.part = :part_1 UNION ALL SELECT p.sub_part AS p_sub_part, p.part AS p_part, p.quantity AS p_quantity FROM part p, included_parts WHERE p.part = included_parts.sub_part) SELECT included_parts.sub_part AS included_parts_sub_part, sum(included_parts.quantity) AS total_quantity FROM included_parts GROUP BY included_parts.sub_part 2014-10-03 16:54:34,346 INFO sqlalchemy.engine.base.Engine {'part_1': 'our part'}
Thanks for this fix!
-
repo owner yes i was suggesting that in cte() we add some more options of some kind that work similarly to "prefixes" or "hints" on a select().
-
reporter I believe it would work great.
-
repo owner OK how is this SQL:
WITH included_parts(sub_part, part, quantity) AS (SELECT part.sub_part AS sub_part, part.part AS part, part.quantity AS quantity FROM part WHERE part.part = :part_1 UNION ALL SELECT p.sub_part AS p_sub_part, p.part AS p_part, p.quantity AS p_quantity FROM part p, included_parts WHERE p.part = included_parts.sub_part) search depth first by ename set ord1 cycle empno set y_cycle to 1 default 0 SELECT included_parts.sub_part, sum(included_parts.quantity) AS total_quantity FROM included_parts GROUP BY included_parts.sub_part
?
-
repo owner you'd call it like this....:
included_parts = session.query( Part.sub_part, Part.part, Part.quantity).\ filter(Part.part=="our part").\ cte(name="included_parts", recursive=True).suffix_with( "search depth first by ename set ord1", "cycle empno set y_cycle to 1 default 0", dialect='oracle')
-
repo owner - attached 3220.patch
-
repo owner with the patch attached. this is an easy add if it works for you.
-
reporter Patch applied, worked like charm.
for completeness, here is the new version of the same script
from sqlalchemy.orm import aliased from sqlalchemy import String, Integer, Column, func, create_engine from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.orm import sessionmaker engine = create_engine('oracle://scott:tiger@localhost:49161/xe', echo=True) Base = declarative_base() class Part(Base): __tablename__ = 'part' part = Column(String(200), primary_key=True) sub_part = Column(String(200), primary_key=True) quantity = Column(Integer) Base.metadata.create_all(engine) Session = sessionmaker(bind=engine) session = Session() included_parts = session.query( Part.sub_part, Part.part, Part.quantity).\ filter(Part.part=="our part").\ cte(name="included_parts", recursive=True).suffix_with( "search depth first by part set ord1", "cycle part set y_cycle to 1 default 0", dialect='oracle') incl_alias = aliased(included_parts, name="pr1") parts_alias = aliased(Part, name="p") included_parts = included_parts.union_all( session.query( parts_alias.sub_part, parts_alias.part, parts_alias.quantity).\ filter(parts_alias.part==included_parts.c.sub_part) ) q = session.query( included_parts.c.sub_part, func.sum(included_parts.c.quantity). label('total_quantity') ).\ group_by(included_parts.c.sub_part) session.execute(q)
and here's the query I got back:
WITH included_parts(sub_part, part, quantity) AS (SELECT part.sub_part AS sub_part, part.part AS part, part.quantity AS quantity FROM part WHERE part.part = :part_1 UNION ALL SELECT p.sub_part AS p_sub_part, p.part AS p_part, p.quantity AS p_quantity FROM part p, included_parts WHERE p.part = included_parts.sub_part) search depth first by part set ord1 cycle part set y_cycle to 1 default 0 SELECT included_parts.sub_part AS included_parts_sub_part, sum(included_parts.quantity) AS total_quantity FROM included_parts GROUP BY included_parts.sub_part
I still need to test it with my real life code (that is a little more complex than this) but I am confident it will work: I am about to start with it right away. Thanks a lot for your time and hard work.
One last question if you don't mind: as I mentioned I need the same stuff to work with sql server. In case I have some stuff to append with suffix_with I assume I can chain them (of course, changing the dialect option). Right?
-
repo owner they chain sure
-
reporter Mike, hi. So sorry to hack this thread but I have one strange issue that is preventing me from solving my own CTE:
As you can see in the example, we aliased included_parts as pr1, however such alias in never used in the resulting query. I am doing the same thing with my own query but the alias appears in the resulting query this time and Oracle complains because it doesn't like the table AS alias construction.
e.g.:
DatabaseError: (DatabaseError) ORA-00907: missing right parenthesis ... FROM tag_hierarchy t, tag_q AS tq1 WHERE ...
If I remove the AS from tag_q AS tq1, it works like charm. Any ideas how can I achieve this?
-
repo owner the alias() construct when compiled with the Oracle dialect omits the "AS". Not sure what the specifics are that an "AS" is getting in there, would need to see that.
-
reporter Let me see if I can strip all noise from my working code and have a minimal reproducible example to share.
-
reporter Looks like the same example that comes with the docs causes the error:
from sqlalchemy.orm import aliased from sqlalchemy import String, Integer, Column, func, create_engine from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.orm import sessionmaker engine = create_engine('oracle://scott:tiger@localhost:1521/xe', echo=True) Base = declarative_base() class Part(Base): __tablename__ = 'part' part = Column(String(200), primary_key=True) sub_part = Column(String(200), primary_key=True) quantity = Column(Integer) Base.metadata.create_all(engine) Session = sessionmaker(bind=engine) session = Session() included_parts = session.query( Part.sub_part, Part.part, Part.quantity).\ filter(Part.part=="our part").\ cte(name="included_parts", recursive=True).suffix_with( "search depth first by part set ord1", "cycle part set y_cycle to 1 default 0", dialect='oracle') incl_alias = aliased(included_parts, name="pr1") parts_alias = aliased(Part, name="p") included_parts = included_parts.union_all( session.query( parts_alias.sub_part, parts_alias.part, parts_alias.quantity).\ filter(parts_alias.part==incl_alias.c.sub_part) ) q = session.query( included_parts.c.sub_part, func.sum(included_parts.c.quantity). label('total_quantity') ).\ group_by(included_parts.c.sub_part) session.execute(q)
sqlalchemy.exc.DatabaseError: (DatabaseError) ORA-00907: missing right parenthesis 'WITH included_parts(sub_part, part, quantity) AS \n(SELECT part.sub_part AS sub_part, part.part AS part, part.quantity AS quantity \nFROM part \nWHERE part.part = :part_1 UNION ALL SELECT p.sub_part AS p_sub_part, p.part AS p_part, p.quantity AS p_quantity \nFROM part p, included_parts AS pr1 \nWHERE p.part = pr1.sub_part) search depth first by part set ord1 cycle part set y_cycle to 1 default 0 \n SELECT included_parts.sub_part AS included_parts_sub_part, sum(included_parts.quantity) AS total_quantity \nFROM included_parts GROUP BY included_parts.sub_part' {'part_1': 'our part'}
If instead of using incl_alias in the union_all query, you use directly included_parts, it does not happen. Of course, if you do so, incl_alias was created with no purpose.
-
repo owner - attached 3220.patch
updated patch
-
repo owner new patch, try it out
-
reporter Tested both with oracle and PG and it's working ok.
-
repo owner so there's some new feature in here and some dialect rearrangement, are you OK waiting for 1.0 for a release of this?
-
repo owner - changed milestone to 1.0
hoping 1.0 is OK on this.
-
reporter I am ok with it (now I have to convince my boss we switch to 1.0 when the time is right). I am about to test the same thing with SQL Server so I have news on it, I let you know.
As usual, thanks for your superb work.
-
reporter Mike, hi. I have been working with SQL Server's CTE and it seems to be working perfect out of the box with the current sqlalchemy implementation minus a little detail that deals with cycle recursion problems. As you can see in the following query:
WITH cte (EmployeeID, ManagerID, Title) as ( SELECT EmployeeID, ManagerID, Title FROM dbo.MyEmployees WHERE ManagerID IS NOT NULL UNION ALL SELECT cte.EmployeeID, cte.ManagerID, cte.Title FROM cte JOIN dbo.MyEmployees AS e ON cte.ManagerID = e.EmployeeID ) --Uses MAXRECURSION to limit the recursive levels to 2 SELECT EmployeeID, ManagerID, Title FROM cte OPTION (MAXRECURSION 2);
there is a final statement OPTION (MAXRECURSION 2) that tells the engine to stop when such circular iteration has been reached and prevent circular errors.
As you can see, the option is not attached to the proper CTE query but to the select that will return the final results. I tried to use suffix_with with the query we tested this to no avail:
q = session.query( included_parts.c.sub_part, func.sum(included_parts.c.quantity). label('total_quantity') ).\ group_by(included_parts.c.sub_part).\ suffix_with("OPTION (MAXRECURSION 2)", dialect='mssql')
the error I got -logically- was:
2014-10-10 10:40:09,626 INFO sqlalchemy.engine.base.Engine {'TABLE_SCHEMA_1': u'dbo', 'TABLE_NAME_1': 'part'} Traceback (most recent call last): File "/home/mariano/Code/satori/code/satori/lib/cte_test1.py", line 39, in <module> group_by(included_parts.c.sub_part).\ AttributeError: 'Query' object has no attribute 'suffix_with'
since the Query object has the prefix_with option, I assume it is ok that it has the suffix_with option too. Can it be possible?
-
repo owner yeah, we might as well add suffix_with() also. Some DB2 people also have some areas where this is useful for them, so will include that here.
-
reporter Awesome. Thanks!
-
repo owner - attached 3220.patch
this version adds with_suffixes() to query, etc.
-
reporter Does suffix_with accepts the dialect parameter for ORM queries?
-
repo owner it should?
-
repo owner ah not on query, as neither does prefix_with. it would have to be added to both.
-
reporter In the particular example I am trying the MAXRECURSION parameter I want to pass is mssql only. I guess it should break with other dialects if passed indiscriminately. With that said, I am ok checking the dialect while building the query and append the suffix_with if the engine requires it so the dialect parameter is not mandatory for me, really.
-
repo owner hey @marplatense - can you try out the existing http://docs.sqlalchemy.org/en/latest/core/selectable.html?highlight=with_statement_hint#sqlalchemy.sql.expression.Select.with_statement_hint method ? I think this is already an appropriate system, suffix() would be redundant here.
-
reporter I would love to. Let me set a proper environ for 1.0, I give it a try and I let you know.
-
reporter Hi @zzzeek, sorry for the delay. One quick question: in order to test what you asked, I have to pull branch rel_1_0, right? or master directly?
TIA, Mariano
-
repo owner either, they are the same
-
reporter Running the last version from master, if we replace the final query to:
q = session.query( included_parts.c.sub_part, func.sum(included_parts.c.quantity). label('total_quantity') ).\ group_by(included_parts.c.sub_part).\ with_statement_hint("OPTION (MAXRECURSION 2)")
it works ok, creating the following query:
WITH included_parts(sub_part, part, quantity) AS (SELECT part.sub_part AS sub_part, part.part AS part, part.quantity AS quantity FROM part WHERE part.part = %(part_1)s UNION ALL SELECT p.sub_part AS p_sub_part, p.part AS p_part, p.quantity AS p_quantity FROM part AS p, included_parts AS pr1 WHERE p.part = pr1.sub_part) SELECT included_parts.sub_part AS included_parts_sub_part, sum(included_parts.quantity) AS total_quantity FROM included_parts GROUP BY included_parts.sub_part OPTION (MAXRECURSION 2)
However, we have a previous suffix_with attached to the CTE (needed by Oracle):
included_parts = session.query( Part.sub_part, Part.part, Part.quantity).\ filter(Part.part=="our part").\ cte(name="included_parts", recursive=True).suffix_with( "search depth first by part set ord1", "cycle part set y_cycle to 1 default 0", dialect='oracle')
and now it does no longer works. If we replace it by with_statement_hint, it raises:
Traceback (most recent call last): File "cte_test1.py", line 20, in <module> cte(name="included_parts", recursive=True).with_statement_hint( AttributeError: 'CTE' object has no attribute 'with_statement_hint'
and if we leave the previous suffix_with:
Traceback (most recent call last): File "cte_test1.py", line 20, in <module> cte(name="included_parts", recursive=True).suffix_with( AttributeError: 'CTE' object has no attribute 'suffix_with'
-
repo owner CTE has suffix_with() if you apply the patch here, that is still needed. With the patch + master, on the CTE, you call suffix_with(). On the select() or Query(), you call with_statement_hint().
-
reporter Upps, my bad. sorry. let me try again and I let you know.
-
reporter With patch applied, keeps working ok with SQL Sever however we have a regression with oracle since it seems the alias for the included_parts as pr1 has re-appeared.
This is the script:
from sqlalchemy.orm import aliased from sqlalchemy import String, Integer, Column, func, create_engine from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.orm import sessionmaker engine = create_engine('oracle://scott:tiger@localhost/xe', echo=True) Base = declarative_base() class Part(Base): __tablename__ = 'part' part = Column(String(200), primary_key=True) sub_part = Column(String(200), primary_key=True) quantity = Column(Integer) Base.metadata.create_all(engine) Session = sessionmaker(bind=engine) session = Session() included_parts = session.query( Part.sub_part, Part.part, Part.quantity).\ filter(Part.part=="our part").\ cte(name="included_parts", recursive=True).suffix_with( "search depth first by part set ord1", "cycle part set y_cycle to 1 default 0", dialect='oracle') incl_alias = aliased(included_parts, name="pr1") parts_alias = aliased(Part, name="p") included_parts = included_parts.union_all( session.query( parts_alias.sub_part, parts_alias.part, parts_alias.quantity).\ filter(parts_alias.part==incl_alias.c.sub_part) ) q = session.query( included_parts.c.sub_part, func.sum(included_parts.c.quantity). label('total_quantity') ).\ group_by(included_parts.c.sub_part).\ with_statement_hint("OPTION (MAXRECURSION 2)", dialect_name="mssql") session.execute(q)
and this is the error:
Traceback (most recent call last): File "cte_test1.py", line 41, in <module> session.execute(q) File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py", line 988, in execute File "build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 800, in execute File "build/bdist.linux-x86_64/egg/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection File "build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 897, in _execute_clauseelement File "build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 1029, in _execute_context File "build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 1230, in _handle_dbapi_exception File "build/bdist.linux-x86_64/egg/sqlalchemy/util/compat.py", line 199, in raise_from_cause File "build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 1022, in _execute_context File "build/bdist.linux-x86_64/egg/sqlalchemy/engine/default.py", line 436, in do_execute sqlalchemy.exc.DatabaseError: (DatabaseError) ORA-00907: missing right parenthesis 'WITH included_parts(sub_part, part, quantity) AS \n(SELECT part.sub_part AS sub_part, part.part AS part, part.quantity AS quantity \nFROM part \nWHERE part.part = :part_1 UNION ALL SELECT p.sub_part AS p_sub_part, p.part AS p_part, p.quantity AS p_quantity \nFROM part p, included_parts AS pr1 \nWHERE p.part = pr1.sub_part) search depth first by part set ord1 cycle part set y_cycle to 1 default 0 \n SELECT included_parts.sub_part AS included_parts_sub_part, sum(included_parts.quantity) AS total_quantity \nFROM included_parts GROUP BY included_parts.sub_part' {'part_1': 'our part'}
If I remove the AS, it works:
WITH included_parts(sub_part, part, quantity) AS (SELECT part.sub_part AS sub_part, part.part AS part, part.quantity AS quantity FROM part WHERE part.part = 'part 1' UNION ALL SELECT p.sub_part AS p_sub_part, p.part AS p_part, p.quantity AS p_quantity FROM part p, included_parts pr1 WHERE p.part = pr1.sub_part) search depth first by part set ord1 cycle part set y_cycle to 1 default 0 SELECT included_parts.sub_part AS included_parts_sub_part, sum(included_parts.quantity) AS total_quantity FROM included_parts GROUP BY included_parts.sub_part
Did I miss something?
-
reporter Mike, hi there. Please let me know if I should be testing this a way other than the one I tried last time. Thanks!
-
repo owner under oracle I get this query:
WITH included_parts(sub_part, part, quantity) AS (SELECT part.sub_part AS sub_part, part.part AS part, part.quantity AS quantity FROM part WHERE part.part = :part_1 UNION ALL SELECT p.sub_part AS p_sub_part, p.part AS p_part, p.quantity AS p_quantity FROM part p, included_parts pr1 WHERE p.part = pr1.sub_part) search depth first by part set ord1 cycle part set y_cycle to 1 default 0 SELECT included_parts.sub_part AS included_parts_sub_part, sum(included_parts.quantity) AS total_quantity FROM included_parts GROUP BY included_parts.sub_part
no AS in there with the included_parts.
commit coming soon.
-
repo owner - changed status to resolved
- Added support for CTEs under Oracle. This includes some tweaks
to the aliasing syntax, as well as a new CTE feature
:meth:
.CTE.suffix_with
, which is useful for adding in special Oracle-specific directives to the CTE. fixes#3220
→ <<cset e46c71b4198e>>
-
repo owner so this is all in master, let me know if things still are not working there.
-
Hi I am a db2 user. I want to know whether this patch is effective to DB2 as well?
I notice
Mike
changedlib/sqlalchemy/dialects/oracle/base.py
to enforce the cte preamble in oracle. But there is nodialects/db2/base.py
. So i would like to know what the method to deal with DB2 in this case? -
repo owner HI Walt -
the DB2 folks would need to make a similar change in their dialect which is at https://code.google.com/p/ibm-db/. contact them on their mailing list.
-
It is a useful guidance, Mike. I will talk to them
- Log in to comment
the "RECURSIVE" keyword can be removed as such:
please test that for me.
as for the CYCLE stuff, the links you're pointing to seem to show that same old Oracle CONNECT BY garbage, which for years prevented me from adding CTE support to SQLAlchemy because I could not come up with a syntax that's transparent to Oracle's bizarre format (see
#1859).we have some options for direct input of text into strategic areas of a statement using prefixes and hints. so to get that stuff in there we would probably have to add some additional "cte_options" or something like that.