Catalogue and document the 'warehouse layer'

Issue #252 resolved
Brian Lewis repo owner created an issue

Need to have these queiries well docum,ented and understood, to faciliate their use both by developers and end-users.

Part of this will be to create an ODC (Office Data Connection) file for each data source.

Comments (2)

  1. Brian Lewis reporter

    Flow (Cohort) Analysis in SIEMIS Warehouse

    Cohort analysis datasets xxxCohort provide the raw fields you need to construct a reconstructed cohort model to calculate flow rates ( promotion , repetition, dropout, survival). The xxxFlow datasets already include in the calculated flow indicators.

    There are 5 flavours provided by SIEMIS Warehouse at different levels of aggregation. the point being, if you only want to report at, say, province and national level, you can load this data pre-aggregated, rather than loading a much larger set at school level and aggregating in Tableau (or Pivot Table).

    These are : warehouse.schoolCohort/ SchoolFlow Details at school level. To get additional characteristics of the school, join back to warehouse.DimensionSchoolSurvey using the SchoolDimensionssID => [Survey ID]

    SurveyDimensionssID points to the best available survey for the year in question - if this is not a survey collected in that year, then Estimate = 1. It's important to get the school classifiers from here rather than what is currently on the school table (reflecting the school's current status) For example, schools can and do change type (PS=>CHS) and occasionally authority.

    warehouse.cohort

    Aggregated up to the principal attribuets of a school: province, authority.

    warehouse.districtCohort / warehouse.districtFlow

    aggregated up to district

    warehouse.nationCohort / warehouse.nationFlow

    Aggregated up to national totals.

    Why provide cohort and flow versions? The pre-calculated flow rates in the flow version will be useful for "static" reporting ( e.g. and SSRS report, static web page), but in a cube ( Tableau, pivot table) you'll need to build the flow rates with calculated fields to get the expected interactivity. So, you can get a smaller dataset using the cohort version. If you do this, you should check the totals you get against the flow version.

  2. Log in to comment