`mask` (:class:`htsql.tr.code.Space`)

When assembling a new term, indicates that the term is going to be

- ~~tie_terms~~ed to a term that represents the `mask` space.

+ attached to a term that represents the `mask` space.

a new term is assembled. When not set, the current mask space

- A mask indicates that the new term is going to be ~~tie_terms~~ed

+ A mask indicates that the new term is going to be attached

to a term that represent the mask space. Therefore the

assembler could ignore any non-axis operations that are

already enforced by the mask space.

- Augments a term to make it capable of producing the given expression~~s~~.

+ Augments a term to make it capable of producing the given expression.

This is an interface adapter; see subclasses for implementations.

`expression` (:class:`htsql.tr.code.Expression`)

An expression node to inject.

- `term` (:class:`htsql.tr.term.Term`)

+ `term` (:class:`htsql.tr.term.RoutingTerm`)

A term node to inject into.

`state` (:class:`AssemblingState`)

# from inner to outer axes.

- # When shoot does not touch the trunk space, we attach it

+ # When the shoot does not touch the trunk space, we attach it

# using a series tie between the shoot baseline and its base.

# Note that we do not verify (and it is not required) that

# the trunk term export the base space. Apply `inject_ties()`

# Not really reachable since we never generate backward

- # ties in `tie_term()`. It is here for completeness.

+ # ties in `tie_terms()`. It is here for completeness.

axes.append(tie.space.base)

When assembling terms, the following optimizations are applied:

Removing unnecessary non-axis operations. The current `mask` space

- expresses a promise that the generated term will be ~~tie_terms~~ed to

+ expresses a promise that the generated term will be attached to

a term representing the mask space. Therefore the assembler

could skip any non-axis filters that are already enforced by

class InjectSpace(Inject):

+ Augments the term to make it produce the given space.

+ `space` (:class:`htsql.tr.code.Space`)

+ A space node to inject.

+ `term` (:class:`htsql.tr.term.RoutingTerm`)

+ A term node to inject into.

+ The term space must span the given space.

+ `state` (:class:`AssemblingState`)

+ The current state of the assembling process.

def __init__(self, space, term, state):

assert isinstance(space, Space)

+ # It is a bug if we get the `space` plural for the `term` here.

+ # It is a responsibility of `InjectUnit` to guard against unexpected

+ # plural expressions and to issue an appropriate HTSQL error.

assert term.space.spans(space)

super(InjectSpace, self).__init__(space, term, state)

+ # Note that this function works for all space classes universally.

+ # We start with checking for and handling several special cases;

+ # if none of them apply, we grow a shoot term for the given space

+ # and attach it to the main term.

+ # Check if the space is already exported.

if self.space in self.term.routes:

+ # Remove any non-axis filters that are enforced by the term space.

unmasked_space = self.space.prune(self.term.space)

+ # When converged with the term space, `space` and `unmasked_space`

+ # contains the same set of rows, therefore in the context of the

+ # given term, they could be used interchangeably.

+ # In particular, if `unmasked_space` is already exported, we could

+ # use the same route for `space`.

if unmasked_space in self.term.routes:

routes = self.term.routes.copy()

routes[self.space] = routes[unmasked_space]

return self.term.clone(routes=routes)

+ # A special case when the given space is an axis prefix of the term

+ # space. The fact that the space is not exported by the term means

+ # that the term tree is optimized by cutting all axes below some

+ # baseline. Now we need to grow these axes back.

if self.term.backbone.concludes(unmasked_space):

+ # Here we assemble a table term corresponding to the space and

+ # attach it to the axis directly above it using a backward

+ # Find the axis directly above the space. Note that here

+ # `unmasked_space` is the inflation of the given space.

next_axis = self.term.baseline

while next_axis.base != unmasked_space:

next_axis = next_axis.base

+ # It is possible that `next_axis` is also not exported by

+ # the term (specifically, when `next_axis` is below the term

+ # baseline). So we call `inject()` with `next_axis`, which

+ # should match the same special case and recursively add

+ # `next_axis` to the routing table. Bugs in the assembler

+ # and in the compare-by-value code often cause an infinite

+ # loop or recursion here!

lkid = self.state.inject(self.term, [next_axis])

+ # Injecting an axis prefix should never add any axes below

+ # (but will add all the axis prefixes above).

assert unmasked_space not in lkid.routes

+ # Assemble a term corresponding to the axis itself.

rkid = self.state.assemble(unmasked_space,

- mask=unmasked_space.scalar)

- assert unmasked_space.base not in rkid.routes

+ mask=self.state.scalar)

+ # We expect to get a table or a scalar term here.

+ # Join the terms using a backward series tie.

+ # FIXME: technically, nothing prevents us from joining `rkid`

+ # to `lkid` using a forward series tie -- and then we could

+ # remove support for backward ties since this is the only

+ # place where they are used. Currently, however, it is

+ # easier for the outliner to flatten the left child of a join

+ # term, so we put the more complex term to the left position.

tie = SeriesTie(next_axis, is_backward=True)

+ # Since we are expanding the term baseline, the join is always

+ # Re-use the old routing table, but add the new axis.

routes = lkid.routes.copy()

routes[unmasked_space] = rkid[unmasked_space]

routes[self.space] = rkid[unmasked_space]

- return JoinTerm(self.state.tag(), lkid, rkid, [tie], True,

+ # Assemble and return a join term.

+ return JoinTerm(self.state.tag(), lkid, rkid, [tie], is_inner,

+ # None of the special cases apply, so we use a general method:

+ # - grow a shoot term for the given space;

+ # - attach the shoot to the main term.

+ # Assemble a shoot term for the space.

space_term = self.assemble_shoot(self.space, self.term)

extra_routes[self.space] = space_term.routes[self.space]

extra_routes[unmasked_space] = space_term.routes[self.space]

+ # Join the shoot to the main term.

return self.join_terms(self.term, space_term, extra_routes)

# on the same space, but with a different baseline, so that it

# will hit the first special case and produce a table term.

rkid = self.state.assemble(self.space, baseline=self.backbone)

- # The tie ~~tie_terms~~ing the space to its base.

+ # The tie attaching the space to its base.

tie = SeriesTie(self.backbone)

# We use the routing table of the base term with extra routes

class InjectCode(Inject):

+ Augments the term to make it capable of producing the given expression.

+ # Inject all the units that compose the expression.

return self.state.inject(self.term, self.expression.units)

class InjectUnit(Inject):

+ Augments the term to make it produce the given unit.

+ `unit` (:class:`htsql.tr.code.Unit`)

+ `term` (:class:`htsql.tr.term.RoutingTerm`)

+ A term node to inject into.

+ `state` (:class:`AssemblingState`)

+ The current state of the assembling process.

+ `space` (:class:`htsql.tr.code.Space`)

+ An alias to `unit.space`.

assert isinstance(unit, Unit)

super(InjectUnit, self).__init__(unit, term, state)

+ # Extract the unit attributes.

+ # Normally, this should never be reachable. We raise an error here

+ # to prevent an infinite recursion via `InjectCode` in cases when

+ # `Inject` is not implemented for some unit type.

raise NotImplementedError("the inject adapter is not implemented"

" for a %r node" % self.unit)

class InjectColumn(Inject):

+ Injects a column unit into a term.

- if not self.unit.singular(self.term.space):

+ # We don't keep column units in the routing table (there are too

+ # many of them). Instead presence of a space node in the routing

+ # table indicates that all columns of the prominent table of the

+ # space are exported from the term.

+ # To avoid an extra `inject()` call, check if the unit space

+ # is already exported by the term.

+ if self.space in self.term.routes:

+ # Verify that the unit is singular on the term space.

+ if not self.term.spans(self.space):

raise AssembleError("expected a singular expression",

+ # Inject the unit space into the term.

return self.state.inject(self.term, [self.unit.space])

class InjectScalar(Inject):

+ Injects a scalar unit into a term.

+ # Injecting is already implemented for a batch of scalar units

+ # that belong to the same space. To avoid code duplication,

+ # we delegate injecting to a batch consisting of just one unit.

+ # Check if the unit is already exported by the term.

if self.unit in self.term.routes:

- group = ScalarBatchExpression(self.unit.space, [self.unit],

+ # Form a batch consisting of a single unit.

+ batch = ScalarBatchExpression(self.unit.space, [self.unit],

- return self.state.inject(self.term, [group])

+ # Delegate the injecting to the batch.

+ return self.state.inject(self.term, [batch])

class InjectAggregate(Inject):

+ Inject an aggregate unit into a term.

+ # Injecting is already implemented for a batch of aggregate units

+ # that share the same base and plural spaces. To avoid code

+ # duplication, we delegate injecting to a batch consisting of

+ # Check if the unit is already exported by the term.

if self.unit in self.term.routes:

- group = AggregateBatchExpression(self.unit.plural_space,

+ # Form a batch consisting of a single unit.

+ batch = AggregateBatchExpression(self.unit.plural_space,

self.unit.space, [self.unit],

- return self.state.inject(self.term, [group])

+ # Delegate the injecting to the batch.

+ return self.state.inject(self.term, [batch])

class InjectCorrelated(Inject):

+ Injects a correlated unit into a term.

- if not self.unit.singular(self.term.space):

+ # In the term tree, correlated subqueries are represented using

+ # a correlated term node. A correlated term is a binary node with

+ # the left operand representing the main query and the right operand

+ # representing the correlated subquery. Conditions that attach

+ # the correlated subquery to the main query are expressed as

+ # Check if the unit is already exported by the term.

+ if self.unit in self.term.routes:

+ # Verify that the unit is singular on the term space.

+ if not self.term.spans(self.space):

raise AssembleError("expected a singular expression",

- if self.unit in self.term.routes:

+ # The general chain of operations is as follows:

+ # - assemble a term for the unit space;

+ # - inject the unit into the unit term;

+ # - attach the unit term to the main term.

+ # However, when the unit space coincides with the term space,

+ # it could be reduced to:

+ # - inject the unit directly into the main term.

+ # We say that the unit is *native* to the term if the unit

+ # space coincides with the term space (or dominates over it).

+ # Note that currently the latter is always the case because

+ # all correlated units are wrapped with a scalar unit sharing

+ # Check if the unit is native to the term.

is_native = self.space.dominates(self.term.space)

+ # If so, we are going to inject the unit directly into the term.

- trunk_term = self.assemble_shoot(self.space, self.term)

+ # Otherwise, assemble a separate term for the unit space.

+ # Note: currently, not reachable.

+ unit_term = self.assemble_shoot(self.space, self.term)

+ # Assemble a term for the correlated subquery.

plural_term = self.assemble_shoot(self.unit.plural_space,

- trunk_term, [self.unit.code])

- ties = self.tie_terms(trunk_term, plural_term)

- trunk_term = self.inject_ties(trunk_term, ties)

- routes = lkid.routes.copy()

+ unit_term, [self.unit.code])

+ # The ties connecting the correlated subquery to the main query.

+ ties = self.tie_terms(unit_term, plural_term)

+ # Make sure that the unit term could export tie conditions.

+ unit_term = self.inject_ties(unit_term, ties)

+ # Generate a correlation term.

+ routes = unit_term.routes.copy()

routes[self.unit] = plural_term.tag

- unit_term = CorrelationTerm(self.state.tag(), lkid, rkid, ties,

+ unit_term = CorrelationTerm(self.state.tag(), unit_term, plural_term,

+ ties, lkid.space, routes)

+ # If we attached the unit directly to the main term, we are done.

+ # Otherwise, we need to attach the unit term to the main term.

extra_routes = { self.unit: plural_term.tag }

return self.join_terms(self.term, unit_term, extra_routes)

-class InjectScalarBatch(Inject):

- adapts(ScalarBatchExpression)

- def __init__(self, expression, term, state):

- super(InjectScalarBatch, self).__init__(expression, term, state)

- self.space = expression.space

- units = [unit for unit in self.collection

- if unit not in self.term.routes]

- if not self.term.space.spans(self.space):

- raise AssembleError("expected a singular expression",

- codes = [unit.code for unit in units]

- if self.space.dominates(self.term.space):

- term = self.state.inject(self.term, codes)

- term = WrapperTerm(self.state.tag(), term,

- term.space, term.routes)

- routes = term.routes.copy()

- routes[unit] = term.tag

- return term.clone(routes=routes)

- unit_term = self.assemble_shoot(self.space, self.term, codes)

- extra_routes = dict((unit, unit_term.tag) for unit in units)

- return self.join_terms(self.term, unit_term, extra_routes)

-class InjectAggregateBatch(Inject):

- adapts(AggregateBatchExpression)

- def __init__(self, expression, term, state):

- super(InjectAggregateBatch, self).__init__(expression, term, state)

- self.plural_space = expression.plural_space

- self.space = expression.space

- units = [unit for unit in self.collection

- if unit not in self.term.routes]

- if not self.term.space.spans(self.space):

- raise AssembleError("expected a singular expression",

- codes = [unit.code for unit in units]

- is_native = self.space.dominates(self.term.space)

- trunk_term = self.assemble_shoot(self.space, self.term)

- plural_term = self.assemble_shoot(self.plural_space,

- ties = self.tie_terms(trunk_term, plural_term)

- trunk_term = self.inject_ties(trunk_term, ties)

- projected_space = MaskedSpace(ties[-1].space, trunk_term.space,

- self.expression.binding)

- routes[tie.space] = plural_term.routes[tie.space]

- routes[projected_space] = routes[projected_space.base]

- projected_term = ProjectionTerm(self.state.tag(), plural_term,

- ties, projected_space, routes)

- extra_routes = dict((unit, projected_term.tag) for unit in units)

- unit_term = self.join_terms(trunk_term, projected_term, extra_routes)

- return self.join_terms(self.term, unit_term, extra_routes)

class InjectBatch(Inject):

+ Injects a batch of expressions into a term.

def __init__(self, expression, term, state):

super(InjectBatch, self).__init__(expression, term, state)

+ # Extract attributes of the batch.

self.collection = expression.collection

+ # The easiest way to inject a group of expressions is to inject

+ # them into the term one by one. However, it will not necessarily

+ # generate the most optimal term tree. We could obtain a better

+ # tree structure if we group all units of the same form and inject

+ # Here we group similar scalar and aggregate units into scalar

+ # and aggregate batch nodes and then inject the batches. We do not

+ # need to do the same for column units since injecting a column

+ # unit effectively injects the unit space making any column from

+ # the space exportable.

+ # We start with the given term, at the end, it will be capable of

+ # exporting all expressions from the given collection.

+ # Gather all the units from the given collection of expressions.

for expression in self.collection:

+ # Ignore spaces and other non-code expressions.

if isinstance(expression, Code):

for unit in expression.units:

+ # We are only interested in units that are not already

+ # exportable by the term.

if unit not in term.routes:

+ # Find all scalar units and group them by the unit space. We

+ # maintain a separate list of scalar spaces to ensure we process

+ # the batches in some deterministic order.

scalar_space_to_units = {}

scalar_spaces.append(space)

scalar_space_to_units[space] = []

scalar_space_to_units[space].append(unit)

+ # Form and inject batches of matching scalar units.

for space in scalar_spaces:

- group_units = scalar_space_to_units[space]

- group = ScalarBatchExpression(space, group_units,

+ batch_units = scalar_space_to_units[space]

+ batch = ScalarBatchExpression(space, batch_units,

- term = self.state.inject(term, [~~group~~])

+ term = self.state.inject(term, [batch])

+ # Find all aggregate units and group them by their plural and unit

+ # spaces. Maintain a list of pairs of spaces to ensure deterministic

+ # order of processing the batches.

aggregate_space_pairs = []

aggregate_space_pair_to_units = {}

aggregate_space_pairs.append(pair)

aggregate_space_pair_to_units[pair] = []

aggregate_space_pair_to_units[pair].append(unit)

+ # Form and inject batches of matching aggregate units.

for pair in aggregate_space_pairs:

plural_space, space = pair

group_units = aggregate_space_pair_to_units[pair]

term = self.state.inject(term, [group])

+ # Finally, just take and inject all the given expressions. We don't

+ # have to bother with filtering out duplicates or expressions that

+ # are already injected.

for expression in self.collection:

term = self.state.inject(term, [expression])

+class InjectScalarBatch(Inject):

+ Injects a batch of scalar units sharing the same space.

+ adapts(ScalarBatchExpression)

+ def __init__(self, expression, term, state):

+ super(InjectScalarBatch, self).__init__(expression, term, state)

+ # Extract attributes of the batch.

+ self.space = expression.space

+ # To inject a scalar unit into a term, we need to do the following:

+ # - assemble a term for the unit space;

+ # - inject the unit into the unit term;

+ # - attach the unit term to the main term.

+ # If we do this for each unit individually, we may end up with

+ # a lot of identical unit terms in our term tree. To optimize

+ # the term tree in this scenario, we collect all scalar units

+ # sharing the same space into a batch expression. Then, when

+ # injecting the batch, we use the same unit term for all units

+ # Get the list of units that are not already exported by the term.

+ units = [unit for unit in self.collection

+ if unit not in self.term.routes]

+ # If none, there is nothing to be done.

+ # Verify that the units are singular relative to the term.

+ # To report an error, we could point to any unit node.

+ if not self.term.space.spans(self.space):

+ raise AssembleError("expected a singular expression",

+ # Extract the unit expressions.

+ codes = [unit.code for unit in units]

+ # Handle the special case when the unit space is equal to the

+ # term space or dominates it. In this case, we could inject

+ # the units directly to the main term and avoid creating

+ # a separate unit term.

+ if self.space.dominates(self.term.space):

+ # Make sure the term could export all the units.

+ term = self.state.inject(self.term, codes)

+ # SQL does not allow evaluating expressions in a terminal

+ # (table or scalar) terms. If we got a terminal term,

+ # cover it with a no-op wrapper.

+ term = WrapperTerm(self.state.tag(), term,

+ term.space, term.routes)

+ # Update the routing table to add all the units and

+ routes = term.routes.copy()

+ routes[unit] = term.tag

+ return term.clone(routes=routes)

+ # The general case: assemble a term for the unit space.

+ unit_term = self.assemble_shoot(self.space, self.term, codes)

+ # And join it to the main term.

+ extra_routes = dict((unit, unit_term.tag) for unit in units)

+ return self.join_terms(self.term, unit_term, extra_routes)

+class InjectAggregateBatch(Inject):

+ Injects a batch of aggregate units sharing the same plural and unit spaces.

+ adapts(AggregateBatchExpression)

+ def __init__(self, expression, term, state):

+ super(InjectAggregateBatch, self).__init__(expression, term, state)

+ # Extract attributes of the batch.

+ self.plural_space = expression.plural_space

+ self.space = expression.space

+ # To inject an aggregate unit into a term, we do the following:

+ # - assemble a term for the unit space;

+ # - assemble a term for the plural space relative to the unit term;

+ # - inject the unit expression into the plural term;

+ # - project plural term into the unit space;

+ # - attach the projected term to the unit term;

+ # - attach the unit term to the main term.

+ # When the unit space coincides with the main term space, we could

+ # avoid assembling a separate unit term, and instead attach the

+ # projected term directly to the main term.

+ # In any case, if we perform this procedure for each unit

+ # individually, we may end up with a lot of identical unit terms

+ # in the final term tree. So when there are more than one aggregate

+ # unit with the same plural and unit spaces, it make sense to

+ # collect all of them into a batch expression. Then, when injecting

+ # the batch, we could reuse the same unit and plural terms for all

+ # aggregates in the batch.

+ # Get the list of units that are not already exported by the term.

+ units = [unit for unit in self.collection

+ if unit not in self.term.routes]

+ # If none, there is nothing to do.

+ # Verify that the units are singular relative to the term.

+ # To report an error, we could point to any unit node available.

+ if not self.term.space.spans(self.space):

+ raise AssembleError("expected a singular expression",

+ # Extract the aggregate expressions.

+ codes = [unit.code for unit in units]

+ # Check if the unit space coincides with or dominates the term

+ # space. In this case we could avoid assembling a separate unit

+ # term and instead attach the projected term directly to the main

+ is_native = self.space.dominates(self.term.space)

+ # Assemble a separate term for the unit space.

+ # Note: currently it is not reachable since we wrap every

+ # aggregate with a scalar unit sharing the same space.

+ unit_term = self.assemble_shoot(self.space, self.term)

+ # Assemble a term for the plural space against the unit space,

+ # and inject all the aggregate expressions into it.

+ plural_term = self.assemble_shoot(self.plural_space,

+ # Generate ties to attach the projected term to the unit term.

+ ties = self.tie_terms(unit_term, plural_term)

+ # Make sure the unit term could export the tie conditions.

+ unit_term = self.inject_ties(unit_term, ties)

+ # Now we are going to project the plural term onto the unit

+ # space. As the projection basis, we are using the ties.

+ # There are two kinds of ties we could get from `tie_terms()`:

+ # - a list of parallel ties;

+ # - or a single series tie.

+ # If we get a list of parallel ties, the projection basis

+ # comprises the primary keys of the tie spaces. Otherwise,

+ # the basis is the foreign key that joins the tie space to

+ # its base. These are also the columns connecting the

+ # projected term to the unit term.

+ # Determine the space of the projected term.

+ # FIXME: When we get a list of parralel ties from `tie_terms()`,

+ # we take the rightmost tie and assume that the space of the

+ # projected term is the tie space masked by the plural space,

+ # which is more or less correct.

+ # However, when `tie_terms()` returns a series tie, we also claim

+ # that the projected space is the masked tie space, which is not

+ # true at all. Indeed, in this case, the tie space is the baseline

+ # of the term. Since we project the plural term onto the foreign

+ # key joining the baseline to its base, the actual projected space

+ # is the base of the baseline, masked appropriately.

+ # Unfortunately, we cannot specify the real projected space because

+ # the term cannot export any values from this space. Currently,

+ # we maintain an assumption that the term is always able to

+ # export its own space. Perhaps we need to lift it?

+ # FIXME: when the unit space is the scalar space, the projected

+ # space should not be masked (this is one of the peculiarities of

+ projected_space = MaskedSpace(ties[-1].space, self.plural_space,

+ self.expression.binding)

+ # The routing table of the projected term.

+ # FIXME: the projected term should be able to export the tie

+ # conditions, so we add the tie spaces to the routing table.

+ # However we should never attempt to export any columns than

+ # those that form the tie condition -- it will generate invalid

+ # SQL. It is not clear how to fix this, perhaps the routing

+ # table should contain entries for each of the columns, or

+ # a special entry for just tie conditions?

+ routes[tie.space] = plural_term.routes[tie.space]

+ # The term space must always be in the routing table. Here,

+ # `projected_space.base` is `ties[-1].space`.

+ routes[projected_space] = routes[projected_space.base]

+ # Project the plural term onto the basis of the unit space.

+ projected_term = ProjectionTerm(self.state.tag(), plural_term,

+ ties, projected_space, routes)

+ # Attach the projected term to the unit term, add extra entries

+ # to the routing table for each of the unit in the collection.

+ extra_routes = dict((unit, projected_term.tag) for unit in units)

+ unit_term = self.join_terms(unit_term, projected_term, extra_routes)

+ # For native units, we are done since we use the main term as

+ # the unit term. Note: currently this condition always holds.

+ # Otherwise, attach the unit term to the main term.

+ return self.join_terms(self.term, unit_term, extra_routes)

def assemble(expression, state=None, baseline=None, mask=None):

Assembles a new term node for the given expression.