Hello, I am writing a next-generation forum package in Python (somebody needs to beat out vBulletin and those other PHP ones, they dominate the market!) and instead of writing another SQL query builder, I decided to check out SQLAlchemy because of its good reputation and seeming power.

While I am very pleased with what it can do, I feel there is room for improvement, which is why I am bringing up these issues. If you want, I can write the patches to do this stuff but I figured I'd see what the consensus is on these ideas before I spent time writing code just to see patches rejected!

For the majority (if not all) of these examples, I am assuming the following tables for my examples.

user: id - name - pass - email

post: id - user_id - subject - text

JOIN is too hard to do with a partial select

Partial selects seem to be overlooked in SQLAlchemy - which is odd, because they are extremely important for those of us who try to write extremely efficient code for redistribution.

Let's take the following example: I want to select all the posts we have and the corresponding username... but not the email and password because they are irrelevant for the task at hand. How would we do this?

  select([user.c.name, post](user.c.id,), user.c.id == post.c.user_id) 
  # Oh come on, we have the foreign key established. This is a waste of typing, especially with huge multi-table joins!
  select([user.c.name, post](user.c.id,)).join(user, post) 
  # Nah, this creates an incorrect query. Not only will it not execute, but it tries to add the join after the cartesian product.
  join(user,post).select([user.c.name, post](user.c.id,))
  # What? selects on a join don't let you define what columns you want to retrieve. Weird...
  select([user.c.name, post](user.c.id,), from_obj=[join(user,post)](join(user,post))) 
  # Here is how you do it, but man that's ugly.

I feel as if setting a from_obj ruins the "cleanliness" of the code.

Solution 1: Auto-joins

   # Hypothetical. Syntax is up for debate - idea is what counts.
   select([post.r.user.name, post](post.r.user.id,))

   # SQLAlchemy recognizes that you are trying to join because you are using the user table as a property of post, so it creates a JOIN instead of a cartesian product.

What on earth is post.r? Basically, it would stand for "relations", or what not - name isn't important. Of course, this is all related to table metadata... while we could detect basic joins with foreign keys, what if the default behavior of X is to do a LEFT JOIN?

I propose that relationships be added to the table metadata itself instead of an arbitrary ORM. Regardless of whether I use ORM, there IS a relationship between the tables; it doesn't go away when "business objects" are removed from the picture. Adding relationship data to the table metadata itself would allow built queries and partial selects to enjoy the power of relationships without being tied to using an ORM, which can be undesired. (I bring up ORMs later in this ticket)

Full example: Select all posts with the corresponding user id and user name. We had a few users who had alternate accounts and we deleted them but left the posts because they were informative. We can't use a regular JOIN here - it will skip over those posts!

   user = Table ( 'user', ... )
   post = Table ( 'post', ..., maybe_has ('user')) # Create a "maybe" link with all the defaults - i.e, the relation entry in post is called 'user', link using the foreign keys defined to the table 'user'.

  select([post.r.user.name, post](post.r.user.id,)).execute() # specifying post by itself only specifies post's columns - not it's relation tables. You could get all of posts' relation-joins by selecting post.r !

I don't think this is too high level for the query builder. As I said earlier, the relation exists regardless of whether I am using an ORM or not.

Solution 2: select_join?

This wouldn't require any relationship data to be stored.

   select_join([user.c.name, post](user.c.id,), join=[post](user,)) # Assume all columns matching with user and post are going to be joined with each other.
   select_join([user.c.name, post](user.c.id,), outerjoin=[post](user,)) # Oh my!

ORM issues

I like the idea of object -> relation mapping, I really do. However, I believe the one present in SQLAlchemy is way too bulky for its own good. It needs to be further partitioned.

Partial Selects?

You cannot partial select with an object in any sort of convenient way. This makes no sense whatsoever. It's not possible through the objects select(), it's not possible through providing a custom select() to the objects select() (SQLAlchemy will complain about missing fields!)

Complaining about missing fields is nonsense. It's commonplace to work on subsets of data - I don't need every possible field every time I use an object. If a field isn't there, don't set it. Big deal.

Sessions and Query

Too verbose and unnecessary for projects who just want a simple row -> object map. I should be able to get the benefits of relating rows to objects without having to jump through hurdles (I believe caching of objects is completely unnecessary for a well designed application that uses SQL efficiently... this should be delegated outside the scope of the basic ORM.)

use_label doesn't group tables.

Even if I don't use an ORM, I should be able to have separation of tables within my queries. There should be an option to segregate the result set into tables, IE:

   row = select([user](post,), use_labels=True).execute().fetchone()
   # { post_id, post_user_id, post_subject, post_text, user_id, user_name, user_pass, user_email }
   # Oh shit! I have a function that just operates on the post. Why can't I just pass the post table of the result to my function, so it will work regardless of use_labels or aliases?

   row = select([post,user](post,user), segregate_tables=True).execute().fetchone()
   { post : { id, user_id, subject, text}, user : { id, name, pass, email } }

   my_func(row.post) # Alright! Cool!

In conclusion

I hope you all find my ideas interesting. I would be willing to help implement any one of these ideas. I don't know the SQLAlchemy code by heart, but I'm a very experienced programmer and DBA and would love to be apart of this project in any way I can. I mean, nobody wants me to write my own SQL package... right? Better to help others.

Auto-Joins, Partial Selects and ORMs