Commits

Clark C. Evans  committed 42b82a9

adding portions of 5/20 talk to showcase -- Why not SQL?

  • Participants
  • Parent commits 7ca40c6

Comments (0)

Files changed (1)

File doc/showcase.rst

   HTSQL Showcase
 ******************
 
-
 What is HTSQL?
 ==============
 
       :alt: output of /school query
       :target: http://demo.htsql.org/school
 
-HTSQL is a query language for the web.  Queries are URLs_ that can be
-directly typed into a browser.
+HTSQL is a query language for the web.  Queries are URLs that can be
+directly typed into a browser.  
 
-.. _REST: http://en.wikipedia.org/wiki/Representational_State_Transfer
-.. _HTTP: http://www.w3.org/Protocols/rfc2616/rfc2616.html
-.. _Accept: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
-.. _URLs: http://www.ietf.org/rfc/rfc3986.txt
 
 HTSQL is a Relational Database Gateway
 --------------------------------------
 
    .. sourcecode:: sql
 
-      SELECT "school"."name",
-             COALESCE("program"."count", 0),
-             COALESCE("department"."count", 0)
+      SELECT "school"."name", COALESCE("program"."count", 0), COALESCE("department"."count", 0)
       FROM "ad"."school" AS "school"
-      LEFT OUTER JOIN (
-        SELECT COUNT(TRUE) AS "count",
-               "program"."school_code"
-        FROM "ad"."program" AS "program"
-        GROUP BY 2
-      ) AS "program"
-      ON ("school"."code" = "program"."school_code")
-      LEFT OUTER JOIN (
-        SELECT COUNT(TRUE) AS "count",
-               "department"."school_code"
-        FROM "ad"."department" AS "department"
-        GROUP BY 2
-      ) AS "department"
-      ON ("school"."code" = "department"."school_code")
+      LEFT OUTER JOIN (SELECT COUNT(TRUE) AS "count", "program"."school_code" FROM "ad"."program" AS "program" GROUP BY 2) AS "program" ON ("school"."code" = "program"."school_code")
+      LEFT OUTER JOIN (SELECT COUNT(TRUE) AS "count", "department"."school_code" FROM "ad"."department" AS "department" GROUP BY 2) AS "department" ON ("school"."code" = "department"."school_code")
       ORDER BY "school"."code" ASC
 
-HTSQL is a compact, high-level query language.  Often times,
-short HTSQL queries are equivalent to much more complex SQL.
+HTSQL is a compact, high-level navigational query language designed
+for data analysts and web developers.
+
+
+Why not SQL?
+============
+
+Relational algebra has proven to be inadequate for encoding business
+inquiries -- elementary set operations simply do not correspond to
+meaningful data transformations.  The SQL language itself is tedious,
+verbose, and provides poor means of abstraction.  Yet, the relational
+database has proven to be an excellent tool for data modeling, storage
+and retrival.
+
+HTSQL reimagines what it means to query a database.  The combination of
+a *navigational model* with *data flows* enables expressions that
+naturally reflect business inquiries.  The HTSQL translator uses SQL as
+a target assembly language, which allows us to fix the query model and
+language, while keeping current investment in relational systems.
+
+To demonstrate this point, we walk through a set of business inquires
+expressed over a fictitious university schema.
+
+.. diagram:: dia/administrative-directory-small-schema.tex
+   :align: center
+
+This data model has two top-level tables, ``school`` and ``department``,
+where ``department`` has an optional link to ``school``.  Subordinate
+tables, ``course`` and ``program``, have mandatory links to their parents.
+
+SQL conflates Rows & Columns
+----------------------------
+
+  "Please list departments; for each department,
+  show the corresponding school's campus." 
+
+.. sourcecode:: sql
+
+     SELECT d.code, d.name, s.campus
+     FROM ad.department AS d
+     LEFT JOIN ad.school AS s
+            ON (d.school_code = s.code);
+
+The business inquiry asks for a specific set of rows, and then
+correlated columns.  The SQL encoding returns a subset of a cross
+product making it difficult to ensure what each row represents. 
+The ``FROM`` clause doesn't just pick rows, it also plays and auxiliary
+role in choosing columns.
+
+.. sourcecode:: htsql
+
+    /department{code, name, school.campus}
+
+The navigational translation separates the row definition from the
+column selection.  The linking is implicit, and correct.  The encoded
+query can be read aloud as a verbal inquiry.
+
+
+SQL Conflates Filters & Links
+-----------------------------
+
+  "For each department, return the department's
+  name and number of courses having more than
+  3 credit hours."
+
+.. sourcecode:: sql
+
+     SELECT d.name, COUNT(SELECT TRUE FROM ad.course AS c
+                          WHERE c.department_code = d.code
+                            AND c.credits > 3  )
+     FROM ad.department AS d;
+
+For the SQL encoding of this inquiry we use a subquery to avoid row and
+column conflation.  However, ``WHERE`` clause in the subquery conflates
+logic filter with the glue linking department and course.
+
+.. sourcecode:: sql
+
+     SELECT d.name, count(c)
+     FROM ad.department AS d
+     LEFT JOIN ad.course AS c
+            ON (c.department_code = d.code
+                AND c.credits > 3)
+     GROUP BY d.name;
+
+In a common optimization of this query, we replace the correlated
+subquery with a ``GROUP BY`` projection.  This gives us both row/column
+and link/filter conflation, further obfuscating the business inquiry.
+
+.. sourcecode:: htsql
+
+     /department{name, count(course?credits>3)}
+
+The navigational translation keeps the business logic separate from the
+link and the row definition separate from output columns.  The encoded
+query corresponds to the original inquiry.
+
+
+Conflating Projection with Aggregation
+--------------------------------------
+
+  "How many departments by campus?"
+
+.. sourcecode:: sql
+
+   SELECT s.campus, COUNT(d)
+   FROM ad.school AS s 
+   LEFT JOIN ad.department AS d
+     ON (s.code = d.school_code)
+   WHERE s.campus IS NOT NULL
+   GROUP by s.campus;
+
+In the schema there isn't a ``campus`` table, you have to take
+*distinct* values from the school table.  In this SQL query its not
+clear if the ``GROUP BY`` is used only to produce an aggregate, you have
+to examine primary key columns to know for sure.
+
+.. sourcecode:: htsql
+
+   /(school^campus) {campus, count(school.department)}
+
+In a navigational approach, you first construct the projection
+explicitly (using ``^`` operator).  Then, you select from it. 
+In this way the aggregation is indicated separately as part of the
+column selector rather than being confused with the row definition.
+
+
+For SQL, Modest Complexity is Painful
+-------------------------------------
+
+  "For each school with a degree program, return 
+  the school's name, and the average number of 
+  high-credit (>3) courses its departments have."
+
+.. sourcecode:: sql
+
+   SELECT s.name, o.avg_over_3 FROM ad.school AS s
+   JOIN ad.program AS p ON (p.school_code = s.code)
+   LEFT JOIN (
+       SELECT d.school_code, AVG(COALESCE(i.over_3,0)) AS avg_over_3
+       FROM ad.department d
+       LEFT JOIN (
+           SELECT c.department_code, COUNT(c) AS over_3
+            FROM ad.course AS c WHERE c.credits > 3
+            GROUP BY c.department_code
+       ) AS i ON (i.department_code = d.code)
+       GROUP BY d.school_code
+   ) AS o ON (o.school_code = s.code)
+   GROUP BY s.name, o.avg_over_3;
+
+Not only is this query hard to read, it took several passes to get
+correct -- without the ``COALESCE`` you get results that look correct,
+but arn't.
+
+.. sourcecode:: htsql
+
+     /(school?exists(program))
+       {name, avg(department.count(course?credits>3))} 
 
 
 How do I use HTSQL?
 
 HTSQL can be used with any number of higher-level tools.
 
-HTSQL Makes Dashboarding Easy
------------------------------
+HTSQL is a Reporting Toolkit
+----------------------------
 
 .. vsplit::
 
 drill down (``school``, ``department`` and ``course``) for a university
 schema.  The live demo for this dashboard is at http://htraf.htsql.org/.
 
+
 HTSQL is a Communication Tool
 -----------------------------