support range specificaiton in window function
e.g., add the "RANGE BETWEEN" syntax to our long existing over() syntax.
SELECT func(x) OVER (ORDER BY x RANGE BETWEEN UNBOUNDED PRECEDING AND 10 FOLLOWING)
see the syntax here and here, also might think of having the named window function.
so for this we have the option for strings:
func.foo(x).over(frame="RANGE BETWEEN UNBOUNDED PRECEDING AND 10 FOLLOWING)
or maybe an elaborate generative thing
func.foo(x).over().rows(start="unbounded preceding", end="current row")
Comments (11)
-
reporter -
reporter - changed milestone to 1.0.xx
-
assigned issue to
-
reporter - changed milestone to 1.2
-
reporter - edited description
-
@zzzeek What do you think about syntax like this, instead of what you're showing above:
func.sum(table.c.x).over(..., rows={'preceding': 40, 'following': 0}) # rows between 40 preceding and current row func.sum(table.c.x).over(..., rows={'preceding': None}) # rows between unbounded preceding and current row func.sum(table.c.x).over(..., range={'preceding': 2, 'following': None}) # range between 2 preceding and unbounded following
The SQL to Python mapping is as follows:
current row => 0 unbounded preceding => None unbounded following => None No preceding key specified => None No following key specified => current row
I think this is a more Pythonic API, and requires less work for the implementation. I'd be happy to put up a PR for this.
-
reporter I'm not sure what the rationale for the tuple was except that the documentation for the range at http://www.postgresql.org/docs/9.1/static/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS is unclear if there is ever a need to put "FOLLOWING" before "PRECEDING", or if other combinations are meaningful like "ROWS 40 PRECEDING" vs. "ROWS BETWEEN 40 PRECEDING AND CURRENT ROW". So the dictionary style here has to be validated that every possible meaningful value, per the SQL standard, not necessarily PG's behavior, can be represented.
-
reporter review at https://gerrit.sqlalchemy.org/#/c/69/
-
reporter why do we need the keys "preceding" and "following" and why is missing key for "preceding" different than "following"?
if this is just a range, let's do a range:
func.sum(table.c.x).over(..., rows=(-40, 0)) # rows between 40 preceding and current row func.sum(table.c.x).over(..., rows=(None, 0)) # rows between unbounded preceding and current row func.sum(table.c.x).over(..., range=(-2, None)) # range between 2 preceding and unbounded following
that works ?
-
Nice. I like that much better. I'll look into the standard to find what it says about the necessity of BETWEEN.
-
The reason I had the different meaning for missing keys is because in SQL, leaving out preceding has a different meaning than leaving out following. It's moot if we can use a tuple.
-
reporter - changed status to resolved
Support range specification in window function
Fixes:
#3049Co-authored-by: Mike Bayer mike_mp@zzzcomputing.com Change-Id: Ie572095c3e25f70a1e72e1af6858e5edd89fd25e Pull-request: https://github.com/zzzeek/sqlalchemy/pull/264→ <<cset d34063aa3221>>
- Log in to comment
OK here's the syntax:
We are shooting to make this somewhat complex syntax not a big deal to add to an over() clause. So things like calling more methods, or importing more clause constructs, trying to avoid that. Also, as this syntax accepts literal numerics, I don't want to make the same mistake we made for limit/offset, these should be rendered as bound parameters from the gate.
Two new kw params, "range" and "rows". These are mutually exclusive, an error is raised if both are present.
"range" and "rows" each accept a tuple.
The tuple then contains tokens that correspond to the values at http://www.postgresql.org/docs/9.1/static/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS:
that is, they are laid out freeform:
So some things to note above. We're actually going to use a tiny little "expect" algorithm here. that is, we iterate through the tokens. the first token has to be a number, "current [row]", or "[unbounded] preceding" (brackets denote an optional keyword). Based on whether that token is a number or a keyword determines what we "expect" for the next token.
More examples:
Here's the token parser:
so from that you get the frame specification, the the compiler for visit_over uses the frame spec to render the rest of the values.