Missing Disability processing of Annual PDF Survey

Issue #1368 resolved
Ghislain Hachey created an issue

Kiribati reported that the disability data is missing from the data warehouse. Upon further investigating I believe that the upload process of the Annual PDF Survey does not process the Disability table (i.e. missing stored procure [pEnrolmentWrite].[xfdfDisability]). Therefore, the data does not go in the data warehouse. However, the data does get loaded into the system in its raw form (the original PDF XML representation). The recommended course of action would be to:

  • Fix the missing Disability processing feature
  • Reload the PDFs for the schools with submitted Disability data by hand one by one. Previous years had data but I believe they came from estimates from prior years. This leaves you with the decision to either:

    • Keep the data as estimates for 2021 and 2022 (only reload submitted data for 2023)
    • Replace the estimates with real submitted data for all last three years (2021, 2022, 2023). This will perhaps (likely) change your figures for those years.
  • We could also automate pushing all the submitted Disability data from the uploaded PDF survey into the data warehouse but this will require more time and work on the developers end.

Comments (8)

  1. Ghislain Hachey reporter

    Upon further investigating. It seems that

    • The Disability data is in the raw PDF XML stored in [dbo].[SchoolSurveyXml_]
    • The stored procedure used for repeaters (i.e. [dbo].[xfdfGrid]) seems like it should work unchanged also for disability data provided DIsability is correctly configured in bdo.metaPupilTableDefs.
    • The app backend does seem to miss calling this stored procedure with the part of the XML that contains the disability data (needs fixing)

    Finally consider back porting disability data from raw for

    • Maybe not 2021 (those are labelled as not estimate in warehouse)
    • 2022 (labelled as estimate)
    • 2023 (completely missing)

  2. Ghislain Hachey reporter

    It turns out, no fix needed on the backend. Simply add the missing data below in table dbo.lkpPupilTableTupes and it should be executed by the backend normally.

  3. Ghislain Hachey reporter

    Finally, extracting data for past years was done with the following slow script (a set based approach like using a cross join would be better for more records).

    -- Slow performing script to run the stored proc on each past Disability XML partial
    -- (i.e. 244 records ran 3-4 minutes on i9 11th Gen laptop CPU)
    DECLARE @XmlData XML;
    DECLARE @ID INT;
    DECLARE @XmlCursor CURSOR;
    
    -- Declare a cursor to iterate over each row
    SET @XmlCursor = CURSOR FOR
        SELECT SSX.ssID, ssXml.query('declare namespace ns="http://ns.adobe.com/xfdf/"; (/ns:xfdf/ns:fields/ns:field[@name="Dis"])[1]') AS DisabilityXML
        FROM SchoolSurveyXml_ SSX
        INNER JOIN SchoolSurvey SS ON SSX.ssID = SS.ssID
        WHERE SS.svyYear IN (2021, 2022, 2023);
    
    -- Open the cursor
    OPEN @XmlCursor;
    
    -- Fetch the first row
    FETCH NEXT FROM @XmlCursor INTO @ID, @XmlData;
    
    -- Loop through each row
    WHILE @@FETCH_STATUS = 0
    BEGIN
        -- Execute your stored procedure with the XML data and ID as parameters
        BEGIN TRY
            SELECT 'Processing dbo.xfdfGrid on', @ID, @XmlData;
            EXEC dbo.xfdfGrid @surveyID=@ID,@grid=@XmlData;
        END TRY
        BEGIN CATCH
            -- Handle the error as needed
            PRINT 'Error occurred for ID: ' + CAST(@ID AS VARCHAR(10)) + '. Error message: ' + ERROR_MESSAGE();
        END CATCH
    
        -- Fetch the next row
        FETCH NEXT FROM @XmlCursor INTO @ID, @XmlData;
    END
    
    -- Close and deallocate the cursor
    CLOSE @XmlCursor;
    DEALLOCATE @XmlCursor;
    

    There was one SchoolSurvey XML that threw an error because of bad data (i.e. an X where an int should be for a number of students). This was fixed by changing the bad value to 0.

    -- Sample modification of a bad bit of data in a specific survey in a specific XML element
    -- (in this case there was an 'X' instead of an int for Disability number)
    BEGIN TRANSACTION;
    
    SELECT SSX.ssID, ssXml.query('declare namespace ns="http://ns.adobe.com/xfdf/"; (/ns:xfdf/ns:fields/ns:field[@name="Dis"]/ns:field[@name="D"]/ns:field[@name="09"]/ns:field[@name="02"]/ns:field[@name="F"]/ns:value/text())[1]') AS DisabilityXML
    FROM SchoolSurveyXml_ SSX
    INNER JOIN SchoolSurvey SS ON SSX.ssID = SS.ssID
    WHERE SSX.ssID = @SSID;
    
    UPDATE SchoolSurveyXml_
    SET ssXml.modify('
        declare namespace ns="http://ns.adobe.com/xfdf/";
        replace value of (/ns:xfdf/ns:fields/ns:field[@name="Dis"]/ns:field[@name="D"]/ns:field[@name="09"]/ns:field[@name="02"]/ns:field[@name="F"]/ns:value/text())[1]
        with "0"
    ')
    WHERE ssID = @SSID;
    
    SELECT SSX.ssID, ssXml.query('declare namespace ns="http://ns.adobe.com/xfdf/"; (/ns:xfdf/ns:fields/ns:field[@name="Dis"]/ns:field[@name="D"]/ns:field[@name="09"]/ns:field[@name="02"]/ns:field[@name="F"]/ns:value/text())[1]') AS DisabilityXML
    FROM SchoolSurveyXml_ SSX
    INNER JOIN SchoolSurvey SS ON SSX.ssID = SS.ssID
    WHERE SSX.ssID = @SSID;
    
    ROLLBACK;
    

    Then compare the before and after in Excel pivot tables for the expected new disability data. The results, less estimates in the warehouse and new data where none was available in 2023. Still there is a mixture of schools reporting over the years as observed by the remaining and changing estimates values but this is another thing.

  4. Ghislain Hachey reporter

    Only remaining task is to execute this in production after a meeting with country data meet first.

  5. Brian Lewis repo owner

    All great work Ghislain

    This broke when switching to the upload in the portal (2022): in the desktop survey loader the code was validated against metaPupilTableDefs, not lkpPupilTableTypes

    Only some small suggestions on the fix-up SQL: its easier to code and fast er to run to loop through records using this min(xx) > @variable.... structure as below.

    Also you can move the ‘Dis’ to a sql variable and reference it in the xquery using the sql:variable() function - possibly makes this the basis for a stored proc that takes the type as a variable (if its ever needed again)

    declare @dis xml
    
    declare @svy xml
    declare @FieldTag nvarchar(10) = 'Dis'
    
    --Select top 10 * from SchoolSurveyXml_ WHERE ssID in (Select ssID from SchoolSurvey WHERE svyYEar = 2023)
    declare @ssID int
    
    select @ssId = min(ssID) from SchoolSurveyXml_
    
    while @ssID is not null begin
    
        Select @svy = ssXml from SchoolSurveyXml_ WHERE ssID = @ssID
    
        ;
        WITH XMLNAMESPACES 
        (  
            'http://ns.adobe.com/xfdf/' as x
    
        )  
        SELECT @dis = @svy.query('/x:xfdf/x:fields/x:field[@name=sql:variable("@FieldTag")]')
        SELECT @ssID ssID, @dis surveyNode
        begin transaction
    
            --exec dbo.xfdfGrid @ssID, @dis
    
            --Select * from PupilTables 
            --WHERE ssID = @ssID
    
        rollback
        -- advance to next record
        select @ssId = min(ssID) from SchoolSurveyXml_ WHERE ssID > @ssID
    
    end
    

  6. Ghislain Hachey reporter

    Thanks Brian. I’ve adapted your idea of moving the ‘Dis’ to a sql variable and reference it in the xquery. But changing from cursor to min approach makes the otherwise same script run much longer. Perhaps something I do wrong. Anyway, Ill keep both as reference.

  7. Log in to comment