Source

htsql / doc / showcase.rst

Full commit

HTSQL Showcase

What is HTSQL?

HTSQL is a query language and web service for relational databases.

HTSQL is a Web Service

HTSQL is a query language for the web. Queries are URLs that can be directly typed into a browser.

HTSQL is a Relational Database Gateway

HTSQL wraps an existing relational database. Queries are translated into SQL.

HTSQL is an Advanced Query Language

HTSQL is a compact, high-level navigational query language designed for data analysts and web developers.

Why not SQL?

Relational algebra has proven to be inadequate for encoding business inquiries -- elementary set operations simply do not correspond to meaningful data transformations. The SQL language itself is tedious, verbose, and provides poor means of abstraction. Yet, the relational database has proven to be an excellent tool for data modeling, storage and retrival.

HTSQL reimagines what it means to query a database. The combination of a navigational model with data flows enables expressions that naturally reflect business inquiries. The HTSQL translator uses SQL as a target assembly language, which allows us to fix the query model and language, while keeping current investment in relational systems.

To demonstrate this point, we walk through a set of business inquires expressed over a fictitious university schema.

This data model has two top-level tables, school and department, where department has an optional link to school. Subordinate tables, course and program, have mandatory links to their parents.

SQL conflates Rows & Columns

"Please list departments; for each department, show the corresponding school's campus."
SELECT d.code, d.name, s.campus
FROM ad.department AS d
LEFT JOIN ad.school AS s
       ON (d.school_code = s.code);

The business inquiry asks for a specific set of rows, and then correlated columns. The SQL encoding returns a subset of a cross product making it difficult to ensure what each row represents. The FROM clause doesn't just pick rows, it also plays and auxiliary role in choosing columns.

/department{code, name, school.campus}

The navigational translation separates the row definition from the column selection. The linking is implicit, and correct. The encoded query can be read aloud as a verbal inquiry.

Conflating Projection with Aggregation

"How many departments by campus?"
SELECT s.campus, COUNT(d)
FROM ad.school AS s
LEFT JOIN ad.department AS d
  ON (s.code = d.school_code)
WHERE s.campus IS NOT NULL
GROUP by s.campus;

In the schema there isn't a campus table, you have to take distinct values from the school table. In this SQL query its not clear if the GROUP BY is used only to produce an aggregate, you have to examine primary key columns to know for sure.

/(school^campus) {campus, count(school.department)}

In a navigational approach, you first construct the projection explicitly (using ^ operator). Then, you select from it. In this way the aggregation is indicated separately as part of the column selector rather than being confused with the row definition.

For SQL, Modest Complexity is Painful

"For each school with a degree program, return the school's name, and the average number of high-credit (>3) courses its departments have."
SELECT s.name, o.avg_over_3 FROM ad.school AS s
JOIN ad.program AS p ON (p.school_code = s.code)
LEFT JOIN (
    SELECT d.school_code, AVG(COALESCE(i.over_3,0)) AS avg_over_3
    FROM ad.department d
    LEFT JOIN (
        SELECT c.department_code, COUNT(c) AS over_3
         FROM ad.course AS c WHERE c.credits > 3
         GROUP BY c.department_code
    ) AS i ON (i.department_code = d.code)
    GROUP BY d.school_code
) AS o ON (o.school_code = s.code)
GROUP BY s.name, o.avg_over_3;

Not only is this query hard to read, it took several passes to get correct -- without the COALESCE you get results that look correct, but arn't.

/(school?exists(program))
  {name, avg(department.count(course?credits>3))}

How do I use HTSQL?

HTSQL can be used with any number of higher-level tools.

HTSQL is a Reporting Toolkit

The dashboard above (using the JQuery-based HTRAF toolkit) shows a 3-level drill down (school, department and course) for a university schema. The live demo for this dashboard is at http://htraf.htsql.org/.

HTSQL is a Communication Tool

HTSQL reduces the number of meetings in your organization:

From: "Anne (data analyst)" <ann@example.com>
To: "Dave (product manager)" <dave@example.com>
Cc: jess@example.com, tim@example.com, jose@example.com
Subject: do we need to meet?

Does the HTSQL below return, for each school, the average
number of courses offered in its departments? If so, then
Tim and I don't need to meet with you tomorrow.

http://demo.htsql.org/school{name,avg(department.count(course))}

- A

HTSQL is a common language usable by software developers, data analysts, database administrators, and even business users.

Show me this HTSQL!

HTSQL was designed from the ground up as a self-serve reporting tool for data analysts. With HTSQL, the easy stuff is truly easy; and, the complex stuff is easy too.

Database Introspection

On startup, HTSQL examines tables, primary keys, and foreign keys to construct a navigational graph of your database. For example:

This university schema is used in the examples below. The data model has two top-level tables, school and department, where department has an optional link to school. Subordinate tables, having mandatory foreign key references, are course and program.

Choosing a Table

HTSQL queries typically start with a table.

This query returns all departments.

Selecting Columns

Output columns are selected with curly brackets {}; the :as decorator sets the title.

This query returns, for each department, the name of the associated school and the name of the department.

Filtering Rows

HTSQL lets you filter results with arbitrary predicates.

This query returns courses from the school of engineering having more than 3 credits.

Paging and Sorting

Table operations such as sorting and paging could be freely combined.

This query returns courses 21 to 30 in the course catalog as sorted by number of credits.

Aggregating Data

In HTSQL, aggregates aren't a reason to run to the DBA.

This query returns, for each school having a MS program, the average number of courses offered across its departments.

What's up Next?

Over the next few months we'll be adding more features (some of them are already implemented in our internal 1.X branch).

Projections

HTSQL will support complex grouping operations.

This query returns the number of programs per degree.

Hierarchical Output

HTSQL is not to be limited to tabular output.

This query will return all schools with associated programs and departments.

More Backends

The current release of HTSQL supports PostgreSQL and SQLite. Subsequent releases will add support for MySQL, Oracle and Microsoft SQL Server.

The challenge here is providing consistent function definitions and semantics that work across various SQL database systems.