Claim that 'column elements' are the most fundamental part of expression API seems contradictory

Issue #3793 resolved
keeyip chan created an issue

I am referring to the first paragraph of http://docs.sqlalchemy.org/en/latest/core/sqlelement.html

I may just be confused by the order of sentences, but it seems that the "most fundamental part" ought to be "clause elements" rather than "column elements". It just seems contradictory to state in one sentence that column elements are the most fundamental but then claim that the "core of all SQL expression constructs is the ClauseElement".

Proposed rewording: The expression API represents each part of a SQL expression as a subclass of ClauseElement. Subclasses are used to represent conjunctions, column selections, filters, and many other types of SQL clauses. The most basic of these clauses is the ColumnElement, which represents a column and its datatype. The expression API defines interfaces for combining ClauseElement objects and to generate SQL statements from those combinations.

Comments (9)

  1. Mike Bayer repo owner

    The link you posted is the docstring for the text() construct. The first paragraph there is: "Construct a new TextClause clause, representing a textual SQL string directly.". I don't think that's what you meant.

    The first paragraph of the whole page is:

    The most fundamental part of the SQL expression API are the “column elements”, which allow for basic SQL expression support. The core of all SQL expression constructs is the ClauseElement, which is the base for several sub-branches. The ColumnElement class is the fundamental unit used to construct any kind of typed SQL expression.

    <hit "comment" too soon, continued in next comment...>

  2. Mike Bayer repo owner

    So w/ your proposed rewording, let's start with the fact that the existing wording was confusing to you, in that it is not completely semantically accurate. By those rules, your proposed paragraph also should be improved:

    The expression API represents each part of a SQL expression as a subclass of ClauseElement. Subclasses are used to represent conjunctions, column selections, filters, and many other types of SQL clauses.

    To the new user, the use of the phrase "represents each part" is unclear, because "part" could be taken to mean an actual SQL expression instance, not the lexical concept. It sounds like I would have to create my own subclasses of ClauseElement in order to write a new SQL statement.

    The original paragraph which talks about "column elements" as the "most fundamental part" of the expression language is given from a compositional point of view. Any SQL expression that is passed to the database ultimately will have at least one ColumnElement at the base of it. The simplest SQL expression I can possibly think of is:

    SELECT 1
    

    Above, the "1" would need to be expressed in SQLAlchemy as a ColumnElement.

    Another example:

    SELECT * FROM table

    The "*" would again be a ColumnElement, and the table would be a Table that again represents a collection of ColumnElement objects, and when you get your result rows back, these ColumnElement objects can target individual columns in each row. So the ColumnElement is the most fundemantal unit of the SQL expression API in terms of its role in constructing a SQL statement. The ClauseElement is the basemost class, so this would be the most "fundmental" unit from a Python class structure perspective.

    In the search for a less ambiguous set of terminology, the overarching pattern of ClauseElement -> ColumnElement which is then an element of a larger object like a Table that is also ClauseElement is known in classical OO terms as the "composite" pattern (https://en.wikipedia.org/wiki/Composite_pattern). Here, we're looking to distinguish amongst the role of "Component" and "Leaf". I don't think those terms here are going to help people much since they often aren't going to immediately understand the tree-based nature of an expression construct (even though it is an "expression tree").

    I would propose rather that the word "fundamental" be appropriately qualified contextually and discussed after the top level class hierarchy is introduced:

    The expression API consists of a series of classes that, when instantiated, each represent a specific lexical element within a SQL string. These elements are then composable into a structure that may be compiled into a string representation that is passed to a database. The classes are organized into a hierarchy that begins at the basemost ClauseElement class. Key subclasses include ColumnElement, which represents the role of any column-based expression in a SQL statement, such as in the columns clause, WHERE clause, and ORDER BY clause, and FromClause, which represents the role of a token that is placed in the FROM clause of a SELECT statement.

    Since this is the API documentation, I'm fine with getting more wordy to reduce ambiguity. However, a big challenge of these documents is that if you spend too much time being as accurate as possible in your verbiage, you end up with way too much of it. I doubt that many people ever read the giant paragraphs at the top of the tutorials such as http://docs.sqlalchemy.org/en/latest/core/tutorial.html., and the documentation of other projects such as that of Pyramid are often criticized for being too "lawyerly" and hard to read. I try pretty hard to be accurate without getting into the "too lawyerly" category.

  3. keeyip chan reporter

    My goal when reading through the API is to understand the tools I need to work with. I think summarizing content at the top should state the api's purpose and give a little insight into the organization of classes and supporting functions so that I don't need to examine several class's docstrings to build that intuition.

    Let me list what I think would've helped me:

    • I see a bunch of top-level methods that seem to be factories for underlying classes; why should I use these functions rather than instantiating an object directly, what are these objects and do I ever need to know about them ? Do these objects have something in common such as being part of the same base class?

    • It would help to categorize the various expression functions such as ones dealing with conjunctions, ordering, column-building, etc..

    • Is there a general pattern for how these expression objects can be composed? Are there some general limitations for how these can be combined? For example, can I use 'text()' for a clause in a larger query? After looking at the docs for 'text()' I see that this is possible, but it would have grounded my intuition better to see something like 'most of these expressions can be used as part of larger queries, even plain SQL can be mixed in (see #text)'.

    • Knowing that I'm looking at the 'expression api', give an overview for how these objects are translated to sql. A link to detailed explanation would be useful, ie: 'For details on how these objects are translated to SQL, see X'.

    I don't think lengthier summarizing content at the top is a big deal if it helps the reader build intuition because the alternative is for the reader to build up inferences by examining each class and function. I think the "too lawyerly" argument has more to do with tone of voice than length of content. Here's a good article illustrating the importance of designing words and spacing content, https://medium.com/@jsaito/how-to-design-words-63d6965051e9#.e4kfofb9o, I think the same applies for documentation.

  4. Mike Bayer repo owner

    these seem like questions that are answered by the Core Tutorial, did you read that first?

  5. keeyip chan reporter

    Thanks for pointing that out, I skimmed through the tutorial and did not read the entire thing. It would've helped to have those explanations in the API docs or a link to the relevant tutorial section; I would not have expected the rather long tutorial to explain the design of the expression API.

    My intent for this ticket is just to get clarity about the seemingly contradictory claims that both 'column elements' and ClauseElement are 'most fundamental' to the API. Thank you for explaining the expression tree, I find the rewording you did earlier to be more insightful than what's currently written.

  6. Mike Bayer repo owner

    it's a long term goal for all API docs to be cross linked to tutorial section as well as for there to be very good examples in all of them. contributions welcome on that....

  7. Mike Bayer repo owner

    Remove the use of the word "fundamental" to describe ColumnElements

    The context is unclear if "fundamental" refers to the object's role within a SELECT composition or within the class hierarchy of ClauseElement classes.

    Change-Id: I10bbf324fc09034cc27398867b7eca6b6f127abe Fixes: #3793

    → <<cset f9432a0287c7>>

  8. Log in to comment