Workbook merging and loading outstanding issues

Issue #1003 resolved
Ghislain Hachey created an issue

As highest priority at the moments.

Need to figure out why the numbers on the dashboard differ from those in the workbooks. We would expect (at least after all validation have passed) that when filtering Teaching Staff in for example Pohnpei annual census workbook this would equal to the teacher count by district data shown in the online dashboard.

There is a possibility that some of the discrepancy is explained through not having reloaded the workbook after all the latest data validation were installed. Proof of this is shown below when trying to reload the workbook downloaded from the FedEMIS production audit trail.

This however led to following discovery. Cleaning up all above data validation with mocked up data gave rise to the old error which I thought was solved.

Trying to merge this previous version of the workbook to the latest and greatest also discovered yet a new error.

Comments (8)

  1. Brian Lewis repo owner

    Comment: Staff records from the SchoolStaff page of the workbook are loaded into the table TeacherSurvey.

    Staff are loaded when:

    Employment Status is Active

    ‘School’ is not (State) Department of Education ie PNIDOE, CHKDOE etc

    Staff meeting these conditions are loaded regardless of Staff Type.

    So the count of Staff Type= Teaching Staff should not be expected to correspond to the Teacher records in TeacherSurvey.

    Applying these filters to c718f985…xlsm gives 734 records vs 730 in TeacherSurvey. Need to reconcile at school level to see the next step.

    Note that the sheet Staff Summary has been removed from this book? This is useful when driliing in to this.

  2. Brian Lewis repo owner

    Reconciling those 4 teachers (734 vs 730)

    The workbook has 2 extra teachers in schools PNI325 and PNI 323

    PNI325:

    missing in TeacherSurvey are the two highlighted records:

    These are clearly a duplicate: have they been tidied up somehow later?

    Similarly, the missing 2 from school PNI323 are :

    Dickson Donre, Kaderkson Donre both Dob 6-sep-74

    These do have 2 teacher identity records and are not the same person; the dob for kaderkson is different - kaderkson is a bus driver.

    So the CORRECT number of staff is possibly 733; ie the 730 plus the 2 Donre, + Serlyane Rudolf.

  3. Ghislain Hachey reporter

    Is this the number of “Teachers” or all the staff? Because coming out of warehouse.TeacherCountSchool we expect only teachers at that point. And how can we verify from workbook to dashboard the teachers if not by using Teaching Staff? I would have expected this to work since there is validation on Staff Type vs duties assigned.

  4. Brian Lewis repo owner

    11/12/2020 - intriduced the view pSurveyOps.TeacherSurveyAudit that decodes the XML on each TeacherSurvey record, that is the image of the workbook row from which it was loaded.

    Also common.BuildTeacherSurveyAudit is a helper to assist in keeping this view up-to-date, as columns may be added in the future.

  5. Brian Lewis repo owner

    So using this view we can drill in to reconcile in detail from

    workbook=> TeacherSurvey=>warehouse.TeacherLocation

    sql :

    declare @fileID nvarchar(50) = 'C718F985-6E19-4C14-B455-B61BDE8EDD6C'
    
    
    -- this is a verbatim dump of the xml loaded into each teacher survey record
    -- from the workbook with the given id
    Select * from 
    [pSurveyOps].TeacherSurveyAudit
    WHERE fileID = @fileID
    
    
    -- subtotals on Teaching Staff / Non-teaching staff
    Select Staff_Type, count(*) Num 
    from pSurveyOps.TeacherSurveyAudit
    WHERE fileID = @fileID
    GROUP BY Staff_Type
    
    -- there will be a teacherlocation record for each teachersurvey record
    Select * 
    from warehouse.TeacherLocation TL
        INNER JOIN pSurveyOps.TeacherSurveyAudit A
            ON TL.tID = A.tId and TL.surveyYear = A.svyYear
            AND A.fileID = @fileID
    
    -- group by the classification of the teacher - T A M X 
    Select TAMX, count(*) Num
    from warehouse.TeacherLocation TL
        INNER JOIN pSurveyOps.TeacherSurveyAudit A
            ON TL.tID = A.tId and TL.surveyYear = A.svyYear
            AND A.fileID = @fileID
    GROUP BY TAMX
    
    
    -- most teaching staff will be 'T' or 'M" (mixed)
    -- non teaching staff will be 'A' or 'X' (Other duties - cooks, gardeners etc)
    -- however there may be some overrides to this, to attempt to correct some misclassifications
    -- these are now reduced becuase they are reported during the validation phase of the upload
    -- so they should be manually corrected and confirmed before loading
    -- Break down the TAMX totals to show these exceptions
    Select TAMX 
    , sum(case when Staff_Type = 'Teaching Staff' then 1 end) TeachingStaff
    , sum(case when Staff_Type = 'Teaching Staff' then null else 1 end) NonTeachingStaff
    , count(*) Num
    from warehouse.TeacherLocation TL
        INNER JOIN pSurveyOps.TeacherSurveyAudit A
            ON TL.tID = A.tId and TL.surveyYear = A.svyYear
            AND A.fileID = @fileID
    GROUP BY TAMX
    
    -- enumerate the idenitifed 'corrections', showing the recorded activities for the staff member
    -- this will exp[lain the reason for the correction
    -- the number of row returned here should correspond to the previous query 
    -- Non-Teaching T + Non Teaching M + Teaching A + Teaching X
    Select TAMX
    , A.[Index]
    , A.First_Name
    , A.Last_Name
    , A.Staff_Type
    , A.Job_Title
    , A.ECE, A.Grade_1, A.Grade_2, A.Grade_3, A.Grade_4, A.Grade_5, A.Grade_6, A.Grade_7, A.Grade_8, A.Grade_9, A.Grade_10, A.Grade_11, A.Grade_12
    , A.Admin, A.Other
    , TL.A, TL.X, TL.T
    from warehouse.TeacherLocation TL
        INNER JOIN pSurveyOps.TeacherSurveyAudit A
            ON TL.tID = A.tId and TL.surveyYear = A.svyYear
            AND A.fileID = @fileID
    WHERE ( Staff_Type = 'Non Teaching Staff' AND TAMX in ('T','M'))
    OR (Staff_Type = 'Teaching Staff' AND TAMX in ('A','X'))
    ORDER BY Staff_Type, [Index]
    
    
    -- Show the NET difference between the 'raw' report of Teaching NonTeaching and the adjusted report in warehouse.TeacherLocation
    Select 
    sum(case when TAMX in ('T','M') then NonTeachingStaff end) TeachingStaffUp
    , sum(case when TAMX in ('A','X') then TeachingStaff end) TeachingStaffDown
    , sum(case when TAMX in ('T','M') then NonTeachingStaff end) -
    sum(case when TAMX in ('A','X') then TeachingStaff end) NetUp
    FROM
    (
    Select TAMX 
    , sum(case when Staff_Type = 'Teaching Staff' then 1 end) TeachingStaff
    , sum(case when Staff_Type = 'Teaching Staff' then null else 1 end) NonTeachingStaff
    , count(*) Num
    from warehouse.TeacherLocation TL
        INNER JOIN pSurveyOps.TeacherSurveyAudit A
            ON TL.tID = A.tId and TL.surveyYear = A.svyYear
            AND A.fileID = @fileID
    GROUP BY TAMX
    ) SUB
    

  6. Brian Lewis repo owner

    So from the above, we have , based on ‘raw’ Teaching Staff/ non Teaching Staff 598 vs 132

    After ‘corrections’ that are encoded in warehouse.TeacherLocation, we have 600 vs 130

    note that these corrections should now be captured by validations, and ambiguity resolved manually. I think the loading of this file predates thsoe enhancements.

  7. Brian Lewis repo owner

    So now we need to reconcile warehouse.TeacherLocation to warehouse.SchoolStaffCount.

    This is where we have the PROBLEM becuase of the handling of ‘Support’ (cf #600)

    This is now updated to use the TAMX classification on warehouse.TeacherLocation to determine ‘Support’ on warehouse.SchoolStaffCount.

    TAMX = T or M => Support = null

    TAMX = A or X => Support = TAMX

    In other words, we still have support NULL for teachers, so that all the views derived from SchoolStaffCount still work. But support will not be split A or X rather than literal ‘Support’

    This change is in warehouse.BuildTeacherLocation.

  8. Log in to comment