Clark C. Evans  committed 42b82a9

adding portions of 5/20 talk to showcase -- Why not SQL?

  • Participants
  • Parent commits 7ca40c6
  • Branches default

Comments (0)

Files changed (1)

File doc/showcase.rst

   HTSQL Showcase
 What is HTSQL?
       :alt: output of /school query
-HTSQL is a query language for the web.  Queries are URLs_ that can be
-directly typed into a browser.
+HTSQL is a query language for the web.  Queries are URLs that can be
+directly typed into a browser.  
-.. _REST:
-.. _HTTP:
-.. _Accept:
-.. _URLs:
 HTSQL is a Relational Database Gateway
    .. sourcecode:: sql
-      SELECT "school"."name",
-             COALESCE("program"."count", 0),
-             COALESCE("department"."count", 0)
+      SELECT "school"."name", COALESCE("program"."count", 0), COALESCE("department"."count", 0)
       FROM "ad"."school" AS "school"
-        SELECT COUNT(TRUE) AS "count",
-               "program"."school_code"
-        FROM "ad"."program" AS "program"
-        GROUP BY 2
-      ) AS "program"
-      ON ("school"."code" = "program"."school_code")
-        SELECT COUNT(TRUE) AS "count",
-               "department"."school_code"
-        FROM "ad"."department" AS "department"
-        GROUP BY 2
-      ) AS "department"
-      ON ("school"."code" = "department"."school_code")
+      LEFT OUTER JOIN (SELECT COUNT(TRUE) AS "count", "program"."school_code" FROM "ad"."program" AS "program" GROUP BY 2) AS "program" ON ("school"."code" = "program"."school_code")
+      LEFT OUTER JOIN (SELECT COUNT(TRUE) AS "count", "department"."school_code" FROM "ad"."department" AS "department" GROUP BY 2) AS "department" ON ("school"."code" = "department"."school_code")
       ORDER BY "school"."code" ASC
-HTSQL is a compact, high-level query language.  Often times,
-short HTSQL queries are equivalent to much more complex SQL.
+HTSQL is a compact, high-level navigational query language designed
+for data analysts and web developers.
+Why not SQL?
+Relational algebra has proven to be inadequate for encoding business
+inquiries -- elementary set operations simply do not correspond to
+meaningful data transformations.  The SQL language itself is tedious,
+verbose, and provides poor means of abstraction.  Yet, the relational
+database has proven to be an excellent tool for data modeling, storage
+and retrival.
+HTSQL reimagines what it means to query a database.  The combination of
+a *navigational model* with *data flows* enables expressions that
+naturally reflect business inquiries.  The HTSQL translator uses SQL as
+a target assembly language, which allows us to fix the query model and
+language, while keeping current investment in relational systems.
+To demonstrate this point, we walk through a set of business inquires
+expressed over a fictitious university schema.
+.. diagram:: dia/administrative-directory-small-schema.tex
+   :align: center
+This data model has two top-level tables, ``school`` and ``department``,
+where ``department`` has an optional link to ``school``.  Subordinate
+tables, ``course`` and ``program``, have mandatory links to their parents.
+SQL conflates Rows & Columns
+  "Please list departments; for each department,
+  show the corresponding school's campus." 
+.. sourcecode:: sql
+     SELECT d.code,, s.campus
+     FROM ad.department AS d
+     LEFT JOIN AS s
+            ON (d.school_code = s.code);
+The business inquiry asks for a specific set of rows, and then
+correlated columns.  The SQL encoding returns a subset of a cross
+product making it difficult to ensure what each row represents. 
+The ``FROM`` clause doesn't just pick rows, it also plays and auxiliary
+role in choosing columns.
+.. sourcecode:: htsql
+    /department{code, name, school.campus}
+The navigational translation separates the row definition from the
+column selection.  The linking is implicit, and correct.  The encoded
+query can be read aloud as a verbal inquiry.
+SQL Conflates Filters & Links
+  "For each department, return the department's
+  name and number of courses having more than
+  3 credit hours."
+.. sourcecode:: sql
+                          WHERE c.department_code = d.code
+                            AND c.credits > 3  )
+     FROM ad.department AS d;
+For the SQL encoding of this inquiry we use a subquery to avoid row and
+column conflation.  However, ``WHERE`` clause in the subquery conflates
+logic filter with the glue linking department and course.
+.. sourcecode:: sql
+     SELECT, count(c)
+     FROM ad.department AS d
+     LEFT JOIN ad.course AS c
+            ON (c.department_code = d.code
+                AND c.credits > 3)
+     GROUP BY;
+In a common optimization of this query, we replace the correlated
+subquery with a ``GROUP BY`` projection.  This gives us both row/column
+and link/filter conflation, further obfuscating the business inquiry.
+.. sourcecode:: htsql
+     /department{name, count(course?credits>3)}
+The navigational translation keeps the business logic separate from the
+link and the row definition separate from output columns.  The encoded
+query corresponds to the original inquiry.
+Conflating Projection with Aggregation
+  "How many departments by campus?"
+.. sourcecode:: sql
+   SELECT s.campus, COUNT(d)
+   FROM AS s 
+   LEFT JOIN ad.department AS d
+     ON (s.code = d.school_code)
+   WHERE s.campus IS NOT NULL
+   GROUP by s.campus;
+In the schema there isn't a ``campus`` table, you have to take
+*distinct* values from the school table.  In this SQL query its not
+clear if the ``GROUP BY`` is used only to produce an aggregate, you have
+to examine primary key columns to know for sure.
+.. sourcecode:: htsql
+   /(school^campus) {campus, count(school.department)}
+In a navigational approach, you first construct the projection
+explicitly (using ``^`` operator).  Then, you select from it. 
+In this way the aggregation is indicated separately as part of the
+column selector rather than being confused with the row definition.
+For SQL, Modest Complexity is Painful
+  "For each school with a degree program, return 
+  the school's name, and the average number of 
+  high-credit (>3) courses its departments have."
+.. sourcecode:: sql
+   SELECT, o.avg_over_3 FROM AS s
+   JOIN ad.program AS p ON (p.school_code = s.code)
+       SELECT d.school_code, AVG(COALESCE(i.over_3,0)) AS avg_over_3
+       FROM ad.department d
+       LEFT JOIN (
+           SELECT c.department_code, COUNT(c) AS over_3
+            FROM ad.course AS c WHERE c.credits > 3
+            GROUP BY c.department_code
+       ) AS i ON (i.department_code = d.code)
+       GROUP BY d.school_code
+   ) AS o ON (o.school_code = s.code)
+   GROUP BY, o.avg_over_3;
+Not only is this query hard to read, it took several passes to get
+correct -- without the ``COALESCE`` you get results that look correct,
+but arn't.
+.. sourcecode:: htsql
+     /(school?exists(program))
+       {name, avg(department.count(course?credits>3))} 
 How do I use HTSQL?
 HTSQL can be used with any number of higher-level tools.
-HTSQL Makes Dashboarding Easy
+HTSQL is a Reporting Toolkit
 .. vsplit::
 drill down (``school``, ``department`` and ``course``) for a university
 schema.  The live demo for this dashboard is at
 HTSQL is a Communication Tool