Commits

Kirill Simonov committed 43fdbca

Showcase: renamed to Introduction to HTSQL, rewritten the last two sections, other updates.

Comments (0)

Files changed (1)

-******************
-  HTSQL Showcase
-******************
+*************************
+  Introduction to HTSQL
+*************************
+
 
 What is HTSQL?
 ==============
       :target: http://demo.htsql.org/school
 
 HTSQL is a query language for the web.  Queries are URLs that can be
-directly typed into a browser.  
-
+directly typed into a browser; the output could be returned in a variety
+of formats including HTML, CSV, JSON, etc.
 
 HTSQL is a Relational Database Gateway
 --------------------------------------
       FROM "ad"."school" AS "school"
       ORDER BY 1 ASC
 
-HTSQL wraps an existing relational database.  Queries are translated
-into SQL.
+HTSQL wraps an existing relational database and translates incoming
+queries into SQL.  The current version of HTSQL supports *SQLite*,
+*PostgreSQL*, *MySQL*, *Oracle*, and *Microsoft SQL Server*.
 
 HTSQL is an Advanced Query Language
 -----------------------------------
 
    .. sourcecode:: htsql
 
-      /school{name, count(program),
+      /school{name,
+              count(program),
               count(department)}
 
    .. sourcecode:: sql
 ============
 
 Relational algebra has proven to be inadequate for encoding business
-inquiries -- elementary set operations simply do not correspond to
+inquiries --- elementary set operations simply do not correspond to
 meaningful data transformations.  The SQL language itself is tedious,
 verbose, and provides poor means of abstraction.  Yet, the relational
-database has proven to be an excellent tool for data modeling, storage
-and retrival.
+database is an excellent tool for data modeling, storage and retrival.
 
 HTSQL reimagines what it means to query a database.  The combination of
 a *navigational model* with *data flows* enables expressions that
 naturally reflect business inquiries.  The HTSQL translator uses SQL as
 a target assembly language, which allows us to fix the query model and
-language, while keeping current investment in relational systems.
+language while keeping current investment in relational systems.
 
 To demonstrate this point, we walk through a set of business inquires
 expressed over a fictitious university schema.
 
 This data model has two top-level tables, ``school`` and ``department``,
 where ``department`` has an optional link to ``school``.  Subordinate
-tables, ``course`` and ``program``, have mandatory links to their parents.
+tables, ``course`` and ``program``, have mandatory links to their
+parents.
 
-SQL conflates Rows & Columns
+SQL Conflates Rows & Columns
 ----------------------------
 
-  "Please list departments; for each department,
-  show the corresponding school's campus." 
+    *"For each department, please show the department name and the
+    corresponding school's campus."*
 
 .. sourcecode:: sql
 
-     SELECT d.code, d.name, s.campus
-     FROM ad.department AS d
-     LEFT JOIN ad.school AS s
-            ON (d.school_code = s.code);
+    SELECT d.name, s.campus
+    FROM ad.department AS d
+    LEFT JOIN ad.school AS s
+           ON (d.school_code = s.code);
 
-The business inquiry asks for a specific set of rows, and then
-correlated columns.  The SQL encoding returns a subset of a cross
-product making it difficult to ensure what each row represents. 
-The ``FROM`` clause doesn't just pick rows, it also plays and auxiliary
-role in choosing columns.
+The business inquiry clearly separates the requested rows (*each
+department*) and columns (*department name* and *corresponding school's
+campus*), but this separation is lost when the query is encoded in SQL.
 
-.. sourcecode:: htsql
+In this SQL query, the ``FROM`` clause not just picks target rows, but
+also includes extra tables required to produce output columns, making it
+difficult to determine business entities represented by each row of the
+output.
 
-    /department{code, name, school.campus}
+.. htsql::
+   :cut: 4
+   :hide:
 
-The navigational translation separates the row definition from the
-column selection.  The linking is implicit, and correct.  The encoded
-query can be read aloud as a verbal inquiry.
+    /department{name, school.campus}
 
+The HTSQL translation separates the row definition from the column
+selection.  The linking is implicit, and correct.  The encoded query can
+be read aloud as a verbal inquiry.
 
 SQL Conflates Filters & Links
 -----------------------------
 
-  "For each department, return the department's
-  name and number of courses having more than
-  3 credit hours."
+    *"For each department, return the department's name and number of
+    courses having more than 2 credit hours."*
 
 .. sourcecode:: sql
 
-     SELECT d.name, COUNT(SELECT TRUE FROM ad.course AS c
-                          WHERE c.department_code = d.code
-                            AND c.credits > 3  )
-     FROM ad.department AS d;
+    SELECT d.name, COUNT(SELECT TRUE FROM ad.course AS c
+                         WHERE c.department_code = d.code
+                           AND c.credits > 2)
+    FROM ad.department AS d;
 
 For the SQL encoding of this inquiry we use a subquery to avoid row and
 column conflation.  However, ``WHERE`` clause in the subquery conflates
-logic filter with the glue linking department and course.
+logic filter with the glue linking ``department`` and ``course``.
 
 .. sourcecode:: sql
 
-     SELECT d.name, count(c)
-     FROM ad.department AS d
-     LEFT JOIN ad.course AS c
-            ON (c.department_code = d.code
-                AND c.credits > 3)
-     GROUP BY d.name;
+    SELECT d.name, COUNT(c)
+    FROM ad.department AS d
+    LEFT JOIN ad.course AS c
+           ON (c.department_code = d.code
+               AND c.credits > 2)
+    GROUP BY d.name;
 
 In a common optimization of this query, we replace the correlated
 subquery with a ``GROUP BY`` projection.  This gives us both row/column
 and link/filter conflation, further obfuscating the business inquiry.
 
-.. sourcecode:: htsql
+.. htsql::
+   :cut: 4
+   :hide:
 
-     /department{name, count(course?credits>3)}
+    /department{name, count(course?credits>2)}
 
-The navigational translation keeps the business logic separate from the
-link and the row definition separate from output columns.  The encoded
-query corresponds to the original inquiry.
-
+The HTSQL translation keeps the business condition separate from linking
+and the row definition separate from output columns.  The query adheres
+the form of the original business inquiry.
 
 Conflating Projection with Aggregation
 --------------------------------------
 
-  "How many departments by campus?"
+    *"How many departments by campus?"*
 
 .. sourcecode:: sql
 
-   SELECT s.campus, COUNT(d)
-   FROM ad.school AS s 
-   LEFT JOIN ad.department AS d
-     ON (s.code = d.school_code)
-   WHERE s.campus IS NOT NULL
-   GROUP by s.campus;
+    SELECT s.campus, COUNT(d)
+    FROM ad.school AS s
+    LEFT JOIN ad.department AS d
+      ON (s.code = d.school_code)
+    WHERE s.campus IS NOT NULL
+    GROUP BY s.campus;
 
-In the schema there isn't a ``campus`` table, you have to take
-*distinct* values from the school table.  In this SQL query its not
-clear if the ``GROUP BY`` is used only to produce an aggregate, you have
-to examine primary key columns to know for sure.
+In the schema, there isn't a ``campus`` table, so you have to take
+*distinct* values of ``campus`` column from the ``school`` table.  The
+``GROUP BY`` clause combines two functions: choosing output row set and
+evaluating the aggregate ``COUNT()``.
 
-.. sourcecode:: htsql
+.. htsql::
+   :cut: 4
+   :hide:
 
-   /(school^campus) {campus, count(school.department)}
+    /(school^campus) {campus, count(school.department)}
 
-In a navigational approach, you first construct the projection
-explicitly (using ``^`` operator).  Then, you select from it. 
-In this way the aggregation is indicated separately as part of the
-column selector rather than being confused with the row definition.
+In the HTSQL query, we start with an explicit projection (the ``^``
+operator), then we select correlated columns.  This way, the aggregation
+is indicated separately as part of the column selector rather than being
+conflated with the row definition.
 
+SQL Lacks Means of Decomposition
+--------------------------------
 
-For SQL, Modest Complexity is Painful
--------------------------------------
-
-  "For each school with a degree program, return 
-  the school's name, and the average number of 
-  high-credit (>3) courses its departments have."
+    *"For each department, return the department name and the number of
+    offered 100's, 200's, 300's and 400's courses."*
 
 .. sourcecode:: sql
 
-   SELECT s.name, o.avg_over_3 FROM ad.school AS s
-   JOIN ad.program AS p ON (p.school_code = s.code)
-   LEFT JOIN (
-       SELECT d.school_code, AVG(COALESCE(i.over_3,0)) AS avg_over_3
-       FROM ad.department d
-       LEFT JOIN (
-           SELECT c.department_code, COUNT(c) AS over_3
+    SELECT d.name,
+           COUNT(CASE WHEN c.no BETWEEN 100 AND 199 THEN TRUE END),
+           COUNT(CASE WHEN c.no BETWEEN 200 AND 299 THEN TRUE END),
+           COUNT(CASE WHEN c.no BETWEEN 300 AND 399 THEN TRUE END),
+           COUNT(CASE WHEN c.no BETWEEN 400 AND 499 THEN TRUE END)
+    FROM ad.department AS d
+    LEFT JOIN ad.course AS c
+           ON (c.department_code = d.code)
+    GROUP BY d.name;
+
+This query is tedious to write and error prone to maintain since SQL
+provides no way to factor the repetitive expression ``COUNT(...)``.
+
+.. htsql::
+   :cut: 4
+   :hide:
+
+    /department.define(course_num($level) := count(course?no>=$level*100
+                                                         &no<($level+1)*100))
+      {name, course_num(1),
+             course_num(2),
+             course_num(3),
+             course_num(4)}
+
+The HTSQL translation avoids this problem by defining a calculated
+attribute ``course_num($level)`` on the ``department`` table.
+
+In SQL, Modest Complexity is Painful
+------------------------------------
+
+    *"For each school with a degree program, return the school's name,
+    and the average number of high-credit (>3) courses its departments
+    have."*
+
+.. sourcecode:: sql
+
+    SELECT s.name, o.avg_over_3
+    FROM ad.school AS s
+    JOIN ad.program AS p ON (p.school_code = s.code)
+    LEFT JOIN (
+        SELECT d.school_code, AVG(COALESCE(i.over_3,0)) AS avg_over_3
+        FROM ad.department d
+        LEFT JOIN (
+            SELECT c.department_code, COUNT(c) AS over_3
             FROM ad.course AS c WHERE c.credits > 3
             GROUP BY c.department_code
-       ) AS i ON (i.department_code = d.code)
-       GROUP BY d.school_code
-   ) AS o ON (o.school_code = s.code)
-   GROUP BY s.name, o.avg_over_3;
+        ) AS i ON (i.department_code = d.code)
+        GROUP BY d.school_code
+    ) AS o ON (o.school_code = s.code)
+    GROUP BY s.name, o.avg_over_3;
 
-Not only is this query hard to read, it took several passes to get
-correct -- without the ``COALESCE`` you get results that look correct,
+Not only this query is hard to read, it took several passes to get
+correct --- without the ``COALESCE`` you get results that look correct,
 but arn't.
 
-.. sourcecode:: htsql
+.. htsql::
+   :cut: 4
+   :hide:
 
      /(school?exists(program))
-       {name, avg(department.count(course?credits>3))} 
+       {name, avg(department.count(course?credits>3))}
+
+This HTSQL query takes less characters than the business inquiry itself.
 
 
 How do I use HTSQL?
 
 HTSQL can be used with any number of higher-level tools.
 
-HTSQL is a Reporting Toolkit
-----------------------------
+HTSQL is a Flexible Reporting Toolkit
+-------------------------------------
 
 .. vsplit::
 
 
       <body>
       <h3>Select a School</h3>
-      <select id="school" 
+      <select id="school"
         data-htsql="/school{code, name}"></select>
       <div style="width: 500px; height: 350px;"
         data-htsql="/program{title, count(student)}
                 data-ref="department"></strong>
       </p>
       <h3>Courses</h3>
-      <table id="course" 
+      <table id="course"
         data-htsql="/course?department_code=$department"
         data-ref="department"></table>
       </body>
 drill down (``school``, ``department`` and ``course``) for a university
 schema.  The live demo for this dashboard is at http://htraf.htsql.org/.
 
-
 HTSQL is a Communication Tool
 -----------------------------
 
-HTSQL reduces the number of meetings in your organization::
+.. vsplit::
 
-   From: "Anne (data analyst)" <ann@example.com>
-   To: "Dave (product manager)" <dave@example.com>
-   Cc: jess@example.com, tim@example.com, jose@example.com
-   Subject: do we need to meet?
+   .. container::
 
-   Does the HTSQL below return, for each school, the average
-   number of courses offered in its departments? If so, then
-   Tim and I don't need to meet with you tomorrow.
+      .. htsql::
+         :cut: 4
 
-   http://demo.htsql.org/school{name,avg(department.count(course))}
+         /school{name, count(department)}
 
-   - A
+   .. container::
+
+      .. image:: img/shared_language.png
+         :alt: shared language diagram
 
 HTSQL is a common language usable by software developers, data analysts,
 database administrators, and even business users.
 for data analysts.  With HTSQL, the easy stuff is truly easy; and,
 the complex stuff is easy too.
 
-Database Introspection
+Scalar Expressions
+------------------
+
+Literal values:
+
+.. htsql:: /{3.14159, 'Hello World!'}
+
+Algebraic expressions:
+
+.. htsql:: /(3+4)*6
+
+Predicate expressions:
+
+.. htsql:: /(7<13)&(1=0|1!=0)
+
+Navigation
+----------
+
+Navigational operations produce data flows.
+
+Selecting a table produces a flow of all records from the table:
+
+.. htsql:: /school
+   :cut: 4
+
+In the scope of ``school`` table, ``department`` is a link to
+associated records from ``department`` table:
+
+.. htsql:: /school.department
+   :cut: 4
+
+This query works as follows:
+
+* ``school`` generates a flow of all records from ``school`` table;
+* for each ``school`` record, ``department`` generates a subflow of
+  associated ``department`` records;
+* the output is produced by merging all ``department`` subflows.
+
+Filtering
+---------
+
+Sieve operator produces records satisfying the specified condition:
+
+.. htsql:: /school?campus='south'
+
+Sorting operator reorders records in the flow:
+
+.. htsql:: /school.sort(campus)
+   :cut: 4
+
+Truncating operator takes a slice of the flow:
+
+.. htsql:: /school.limit(2)
+
+Selection & Definition
 ----------------------
 
-On startup, HTSQL examines tables, primary keys, and foreign keys
-to construct a navigational graph of your database.  For example:
+Selection specifies output columns:
 
-.. diagram:: dia/administrative-directory-small-schema.tex
-   :align: center
+.. htsql:: /school{name, campus}
+   :cut: 4
 
-This university schema is used in the examples below.  The data model
-has two top-level tables, ``school`` and ``department``, where
-``department`` has an optional link to ``school``.  Subordinate tables,
-having mandatory foreign key references, are ``course`` and ``program``.
+Title decorator defines the title of an output column:
 
-Choosing a Table
-----------------
+.. htsql:: /school{name, count(department) :as '# of Dept'}
+   :cut: 4
 
-HTSQL queries typically start with a table.
+Calculated attributes factor out repeating expressions:
 
-.. vsplit::
+.. htsql::
 
-   .. sourcecode:: htsql
+   /school.define(num_dept := count(department))
+          {code, num_dept}?num_dept>3
 
-      /department
+References carry over values across nested scopes:
 
-   .. sourcecode:: sql
+.. htsql::
+   :cut: 4
 
-    SELECT "department"."code",
-           "department"."name",
-           "department"."school_code"
-    FROM "ad"."department" AS "department"
-    ORDER BY 1 ASC
+   /define($avg_credits := avg(course.credits))
+    .course{title, credits}?credits>$avg_credits
 
-`This query`__ returns all departments.
+Aggregation
+-----------
 
-__ http://demo.htsql.org/department
+Aggregates convert plural expressions to singular values.
 
-Selecting Columns
------------------
+Scalar aggregates:
 
-Output columns are selected with curly brackets ``{}``; the ``:as``
-decorator sets the title.
+.. htsql:: /count(department)
 
-.. vsplit::
+Nested aggregates:
 
-   .. sourcecode:: htsql
+.. htsql:: /avg(school.count(department))
 
-      /department{school.name, name}
+Different aggregation operations:
 
-   .. sourcecode:: sql
+.. htsql::
+   :cut: 4
 
-      SELECT "school"."name",
-             "department"."name"
-      FROM "ad"."department" AS "department"
-      LEFT OUTER JOIN "ad"."school" AS "school"
-      ON ("department"."school_code" = "school"."code")
-      ORDER BY "department"."code" ASC
+   /department{name, count(course),
+                     max(course.credits),
+                     sum(course.credits),
+                     avg(course.credits)}?exists(course)
 
-`This query`__ returns, for each department, the name of the
-associated school and the name of the department.
+Projection
+----------
 
-__ http://demo.htsql.org
-        /department{school.name, name}
+Projection operator defines a flow of all distinct values of the given
+expression:
 
-Filtering Rows
---------------
+.. htsql:: /school^campus
 
-HTSQL lets you filter results with arbitrary predicates.
+In the scope of the projection, ``school`` refers to all records from
+``school`` table having the same value of ``campus`` attribute:
 
-.. vsplit::
+.. htsql:: /(school^campus){campus, count(school)}
 
-   .. sourcecode:: htsql
+Linking
+-------
 
-      /course?credits>3
-             &department.school.code='eng'
+Even though HTSQL provides automatic links inferred from foreign key
+constraints, arbitrary linking is also allowed:
 
-   .. sourcecode:: sql
+.. htsql:: /school.({code} -> department{school_code})
+   :cut: 4
 
-      SELECT "course"."department_code",
-             "course"."no",
-             "course"."title",
-             "course"."credits",
-             "course"."description"
-      FROM "ad"."course" AS "course"
-      INNER JOIN "ad"."department" AS "department"
-      ON ("course"."department_code" = "department"."code")
-      LEFT OUTER JOIN "ad"."school" AS "school"
-      ON ("department"."school_code" = "school"."code")
-      WHERE ("course"."credits" > 3)
-        AND ("school"."code" = 'eng')
-      ORDER BY 1 ASC, 2 ASC
+This query uses a linking operator to replicate an automatic link:
 
-`This query`__ returns courses from the school of
-engineering having more than 3 credits.
+.. htsql:: /school.department
+   :cut: 4
+   :hide:
 
-__ http://demo.htsql.org
-        /course?department.school='eng'&credits>3
+Forking operator links a table to itself by the given expression:
 
-Paging and Sorting
-------------------
+.. htsql::
 
-Table operations such as sorting and paging could be freely combined.
+   /school{name, campus}
+          ?count(department)>avg(fork(campus).count(department))
 
-.. vsplit::
+This query returns schools with the number of departments above average
+among all schools in the same campus.  Using a linking operator, this
+query could be written as:
 
-   .. sourcecode:: htsql
+.. htsql::
+   :hide:
 
-      /course.sort(credits).limit(10,20)
-
-   .. sourcecode:: sql
-
-      SELECT "course"."department_code",
-             "course"."no",
-             "course"."title",
-             "course"."credits",
-             "course"."description"
-      FROM "ad"."course" AS "course"
-      ORDER BY 4 ASC NULLS FIRST, 1 ASC, 2 ASC
-      LIMIT 10 OFFSET 20
-
-`This query`__ returns courses 21 to 30 in the course
-catalog as sorted by number of credits.
-
-__ http://demo.htsql.org
-        /course.sort(credits).limit(10,20)
-
-Aggregating Data
-----------------
-
-In HTSQL, aggregates aren't a reason to run to the DBA.
-
-.. vsplit::
-
-   .. sourcecode:: htsql
-
-      /school{name,
-              avg(department.count(course))}
-             ?exists(program.degree='ms')
-
-   .. sourcecode:: sql
-
-      SELECT "school"."name",
-             "department"."avg"
-      FROM "ad"."school" AS "school"
-      LEFT OUTER JOIN (
-        SELECT AVG(CAST(COALESCE("course"."count", 0)
-                        AS NUMERIC)) AS "avg",
-               "department"."school_code"
-        FROM "ad"."department" AS "department"
-        LEFT OUTER JOIN (
-          SELECT COUNT(TRUE) AS "count",
-                 "course"."department_code"
-          FROM "ad"."course" AS "course"
-          GROUP BY 2
-        ) AS "course"
-        ON ("department"."code" = "course"."department_code")
-        GROUP BY 2
-      ) AS "department"
-      ON ("school"."code" = "department"."school_code")
-      WHERE EXISTS(
-        SELECT TRUE
-        FROM "ad"."program" AS "program"
-        WHERE ("school"."code" = "program"."school_code")
-          AND ("program"."degree" = 'ms')
-      )
-      ORDER BY "school"."code" ASC
-
-`This query`__ returns, for each school having a
-MS program, the average number of courses offered
-across its departments.
-
-__ http://demo.htsql.org
-        /school{name,avg(department.count(course))}
-                ?exists(program.degree='ms')
+   /school{name, campus}
+          ?count(department)>avg((campus -> school).count(department))
 
 
 What's up Next?
 ===============
 
-Over the next few months we'll be adding more features (some
-of them are already implemented in our internal 1.X branch).
-
-Projections
------------
-
-HTSQL will support complex grouping operations.
-
-.. vsplit::
-
-   .. sourcecode:: htsql
-
-      /(program^degree){degree,
-                        count(program)}
-
-   .. sourcecode:: sql
-
-      SELECT degree, COUNT(TRUE)
-      FROM ad.program
-      WHERE degree IS NOT NULL
-      GROUP BY 1
-      ORDER BY 1;
-
-`This query`__ returns the number of programs per degree.
-
-__ http://demo.htsql.org
-        /(program^degree){degree,count(program)}
+While HTSQL already demonstrates unprecedented query power, we are going
+to add even more amazing features.
 
 Hierarchical Output
 -------------------
 
-HTSQL is not to be limited to tabular output.
+HTSQL should not be limited to tabular output.
 
-.. vsplit::
+.. sourcecode:: htsql
 
-   .. sourcecode:: htsql
+   /school{name,
+           /program{title},
+           /department{name}}
 
-      /school{name,
-              /program{title},
-              /department{name}}
+This query is to generate a tree-shaped output: for each school, it
+produces the school name, a list of titles of associated programs, and
+a list of names of associated departments.
 
-   .. sourcecode:: sql
+Analytical Processing
+---------------------
 
-      SELECT name, code
-      FROM ad.school
-      ORDER BY code;
+HTSQL should support OLAP cube operations.
 
-      SELECT s.code, p.title
-      FROM ad.school AS s
-      INNER JOIN ad.program AS p
-      ON (s.code = p.school)
-      ORDER BY s.code, p.code;
+.. sourcecode:: htsql
 
-      SELECT s.code, d.name
-      FROM ad.school AS s
-      INNER JOIN ad.department
-      AS d ON (s.code = d.school)
-      ORDER BY s.code,d.code;
+   /rollup(school^campus){campus, count(school.department)}
 
-This query will return all schools with associated programs and
-departments.
+This query is to produce the number of departments per school's campus
+followed by a total value for all campuses.
 
+Recursive Queries
+-----------------
 
-More Backends
--------------
+HTSQL should be able to construct hierarchies from parent-child
+relationships.
 
-The current release of HTSQL supports PostgreSQL and SQLite.
-Subsequent releases will add support for MySQL, Oracle and
-Microsoft SQL Server.
+.. sourcecode:: htsql
 
-The challenge here is providing consistent function definitions
-and semantics that work across various SQL database systems.
+   /program{title, /close(part_of){title}}
 
+This query is to return programs together with a list of all
+dependent subprograms.
+