1. Konrad Rymczak
  2. htsql-firebird

Source

htsql-firebird / doc / introduction.rst

Introduction to HTSQL

What is HTSQL?

HTSQL is a query language and web service for relational databases.

HTSQL is a Web Service

HTSQL is a query language for the web. Queries are URLs that can be directly typed into a browser; the output could be returned in a variety of formats including HTML, CSV, JSON, etc.

HTSQL is a Relational Database Gateway

HTSQL wraps an existing relational database and translates incoming queries into SQL. The current version of HTSQL supports SQLite, PostgreSQL, MySQL, Oracle, and Microsoft SQL Server.

HTSQL is an Advanced Query Language

HTSQL is a compact, high-level navigational query language designed for data analysts and web developers.

Why not SQL?

Relational algebra is frequently inadequate for encoding business inquiries --- elementary set operations do not correspond to meaningful data transformations. The SQL language itself is tedious, verbose, and provides poor means of abstraction. Yet, the relational database is an excellent tool for data modeling, storage and retrieval.

HTSQL reimagines what it means to query a database. The combination of a navigational model with data flows enables expressions that naturally reflect business inquiries. The HTSQL translator uses SQL as a target assembly language, which allows us to fix the query model and language while keeping current investment in relational systems.

To demonstrate this point, we walk through a set of business inquires expressed over a fictitious university schema.

This data model has two top-level tables, school and department, where department has an optional link to school. Subordinate tables, course and program, have mandatory links to their parents.

SQL Conflates Rows & Columns

"For each department, please show the department name and the corresponding school's campus."

This business inquiry clearly separates the requested rows (each department) and columns (department name and corresponding school's campus), but this separation is lost when the query is encoded in SQL:

SELECT d.name, s.campus
FROM ad.department AS d
LEFT JOIN ad.school AS s
       ON (d.school_code = s.code);

In this SQL query, the FROM clause not only picks target rows, but also includes extra tables required to produce output columns. This conflation makes it difficult to determine business entities represented by each row of the output.

The HTSQL translation separates the row definition from the column selection. The linking is implicit, and correct. The encoded query can be read aloud as a verbal inquiry.

Conflating Projection with Aggregation

"How many departments by campus?"

This business inquiry asks for rows corresponding to each campus, and for each row, the number of correlated departments. In the schema, there isn't a campus table, so we have to take distinct values of campus column from the school table. This operation is called projection.

SELECT s.campus, COUNT(d)
FROM ad.school AS s
LEFT JOIN ad.department AS d
  ON (s.code = d.school_code)
WHERE s.campus IS NOT NULL
GROUP BY s.campus;

For this SQL encoding, the GROUP BY clause combines two operations: projection and evaluating the aggregate COUNT(). This conflation causes a reader of the query some effort determining what sort of rows are returned and how the aggregate is related to those rows.

In the HTSQL query, we start with an explicit projection (the ^ operator), then we select correlated columns. This way, the aggregation is indicated separately as part of the column selector rather than being conflated with the row definition.

SQL Lacks Means of Encapsulation

"For each department, return the department name and the number of offered 100's, 200's, 300's and 400's courses."

In this business inquiry, we are asked to evaluate the same statistic across multiple ranges.

SELECT d.name,
       COUNT(CASE WHEN c.no BETWEEN 100 AND 199 THEN TRUE END),
       COUNT(CASE WHEN c.no BETWEEN 200 AND 299 THEN TRUE END),
       COUNT(CASE WHEN c.no BETWEEN 300 AND 399 THEN TRUE END),
       COUNT(CASE WHEN c.no BETWEEN 400 AND 499 THEN TRUE END)
FROM ad.department AS d
LEFT JOIN ad.course AS c
       ON (c.department_code = d.code)
GROUP BY d.name;

This query is tedious to write and error prone to maintain since SQL provides no way to factor the repetitive expression COUNT(...).

The HTSQL translation avoids this duplication by defining a calculated attribute count_courses($level) on the department table and then evaluating it for each course level.

In SQL, Modest Complexity is Painful

"For each school with a degree program, return the school's name, and the average number of high-credit (>3) courses its departments have."

This business inquiry asks us to do the following:

  • pick records from the school table
  • keep only those with an associated degree program
  • for each school record, compute average of:
    • for each associated department, count:
      • associated courses with credits>3
SELECT s.name, o.avg_over_3
FROM ad.school AS s
JOIN ad.program AS p ON (p.school_code = s.code)
LEFT JOIN (
    SELECT d.school_code, AVG(COALESCE(i.over_3,0)) AS avg_over_3
    FROM ad.department d
    LEFT JOIN (
        SELECT c.department_code, COUNT(c) AS over_3
        FROM ad.course AS c WHERE c.credits > 3
        GROUP BY c.department_code
    ) AS i ON (i.department_code = d.code)
    GROUP BY d.school_code
) AS o ON (o.school_code = s.code)
GROUP BY s.name, o.avg_over_3;

Not only is this SQL encoding is hard to read, it took several passes to get right --- without the COALESCE you get results that look correct, but aren't.

Each syntactic component of the HTSQL query is self-contained; when assembled, they form a cohesive translation of the business inquiry.

HTSQL in a Nutshell

HTSQL was designed from the ground up as a self-serve reporting tool for data analysts. With HTSQL, the easy stuff is truly easy; and, the complex stuff is easy too.

In this section we introduce the fundamentals of HTSQL syntax and semantics. For a more incremental approach, please read the :doc:`tutorial`.

Scalar Expressions

Literal values:

Algebraic expressions:

Predicate expressions:

Navigation

A table name by itself produces all records from that table:

In the scope of school table, department is a link to associated records from department table. The following query returns department records via navigation though school:

This query works as follows:

  • school generates all records from school table;
  • for each school record, department generates associated department records;

Filtering

Sieve operator produces records satisfying the specified condition:

Sorting operator reorders records:

Truncating operator takes a slice from the record sequence:

Selection & Definition

Selection specifies output columns:

Title decorator defines the title of an output column:

Calculated attributes factor out repeating expressions:

References carry over values across nested scopes:

Aggregation

Aggregates convert plural expressions to singular values.

Scalar aggregates:

Nested aggregates:

Various aggregation operations:

Projection

Projection operator returns distinct values. This example returns distinct campus values from the school table:

In the scope of the projection, school refers to all records from school table having the same value of campus attribute:

Linking

Even though HTSQL provides automatic links inferred from foreign key constraints, arbitrary linking is also allowed:

This query returns schools with the number of departments above average among all schools.

What's up Next?

We intend to add to HTSQL many more features in the future.

Hierarchical Output

HTSQL should not be limited to tabular output.

/school{name,
        /program{title},
        /department{name}}

This query is to generate a tree-shaped output: for each school, it produces the school name, a list of titles of associated programs, and a list of names of associated departments.

Analytical Processing

HTSQL should support OLAP cube operations.

/rollup(school^campus){campus, count(school.department)}

This query is to produce the number of departments per school's campus followed by a total value for all campuses.

Recursive Queries

HTSQL should be able to construct hierarchies from parent-child relationships.

/program{title, /recurse(part_of){title}}

This query is to return programs together with a list of all dependent subprograms.