Missing Disability processing of Annual PDF Survey

Issue #1368 resolved

Ghislain Hachey created an issue 2024-03-12

Kiribati reported that the disability data is missing from the data warehouse. Upon further investigating I believe that the upload process of the Annual PDF Survey does not process the Disability table (i.e. missing stored procure [pEnrolmentWrite].[xfdfDisability]). Therefore, the data does not go in the data warehouse. However, the data does get loaded into the system in its raw form (the original PDF XML representation). The recommended course of action would be to:

Fix the missing Disability processing feature
Reload the PDFs for the schools with submitted Disability data by hand one by one. Previous years had data but I believe they came from estimates from prior years. This leaves you with the decision to either:
- Keep the data as estimates for 2021 and 2022 (only reload submitted data for 2023)
- Replace the estimates with real submitted data for all last three years (2021, 2022, 2023). This will perhaps (likely) change your figures for those years.
We could also automate pushing all the submitted Disability data from the uploaded PDF survey into the data warehouse but this will require more time and work on the developers end.

‌

Comments (8)

Ghislain Hachey reporter
- assigned issue to
  
  Ghislain Hachey
- 2024-03-13T02:33:52+00:00
Ghislain Hachey reporter
Upon further investigating. It seems that
- The Disability data is in the raw PDF XML stored in [dbo].[SchoolSurveyXml_]
- The stored procedure used for repeaters (i.e. [dbo].[xfdfGrid]) seems like it should work unchanged also for disability data provided DIsability is correctly configured in bdo.metaPupilTableDefs.
- The app backend does seem to miss calling this stored procedure with the part of the XML that contains the disability data (needs fixing)
Finally consider back porting disability data from raw for
- Maybe not 2021 (those are labelled as not estimate in warehouse)
- 2022 (labelled as estimate)
- 2023 (completely missing)
‌

‌

‌
- 2024-03-13T02:43:59+00:00
Ghislain Hachey reporter
It turns out, no fix needed on the backend. Simply add the missing data below in table dbo.lkpPupilTableTupes and it should be executed by the backend normally.

‌
- 2024-03-13T03:07:19+00:00

Ghislain Hachey reporter

Finally, extracting data for past years was done with the following slow script (a set based approach like using a cross join would be better for more records).

-- Slow performing script to run the stored proc on each past Disability XML partial
-- (i.e. 244 records ran 3-4 minutes on i9 11th Gen laptop CPU)
DECLARE @XmlData XML;
DECLARE @ID INT;
DECLARE @XmlCursor CURSOR;

-- Declare a cursor to iterate over each row
SET @XmlCursor = CURSOR FOR
    SELECT SSX.ssID, ssXml.query('declare namespace ns="http://ns.adobe.com/xfdf/"; (/ns:xfdf/ns:fields/ns:field[@name="Dis"])[1]') AS DisabilityXML
    FROM SchoolSurveyXml_ SSX
    INNER JOIN SchoolSurvey SS ON SSX.ssID = SS.ssID
    WHERE SS.svyYear IN (2021, 2022, 2023);

-- Open the cursor
OPEN @XmlCursor;

-- Fetch the first row
FETCH NEXT FROM @XmlCursor INTO @ID, @XmlData;

-- Loop through each row
WHILE @@FETCH_STATUS = 0
BEGIN
    -- Execute your stored procedure with the XML data and ID as parameters
    BEGIN TRY
        SELECT 'Processing dbo.xfdfGrid on', @ID, @XmlData;
        EXEC dbo.xfdfGrid @surveyID=@ID,@grid=@XmlData;
    END TRY
    BEGIN CATCH
        -- Handle the error as needed
        PRINT 'Error occurred for ID: ' + CAST(@ID AS VARCHAR(10)) + '. Error message: ' + ERROR_MESSAGE();
    END CATCH

    -- Fetch the next row
    FETCH NEXT FROM @XmlCursor INTO @ID, @XmlData;
END

-- Close and deallocate the cursor
CLOSE @XmlCursor;
DEALLOCATE @XmlCursor;

There was one SchoolSurvey XML that threw an error because of bad data (i.e. an X where an int should be for a number of students). This was fixed by changing the bad value to 0.

-- Sample modification of a bad bit of data in a specific survey in a specific XML element
-- (in this case there was an 'X' instead of an int for Disability number)
BEGIN TRANSACTION;

SELECT SSX.ssID, ssXml.query('declare namespace ns="http://ns.adobe.com/xfdf/"; (/ns:xfdf/ns:fields/ns:field[@name="Dis"]/ns:field[@name="D"]/ns:field[@name="09"]/ns:field[@name="02"]/ns:field[@name="F"]/ns:value/text())[1]') AS DisabilityXML
FROM SchoolSurveyXml_ SSX
INNER JOIN SchoolSurvey SS ON SSX.ssID = SS.ssID
WHERE SSX.ssID = @SSID;

UPDATE SchoolSurveyXml_
SET ssXml.modify('
    declare namespace ns="http://ns.adobe.com/xfdf/";
    replace value of (/ns:xfdf/ns:fields/ns:field[@name="Dis"]/ns:field[@name="D"]/ns:field[@name="09"]/ns:field[@name="02"]/ns:field[@name="F"]/ns:value/text())[1]
    with "0"
')
WHERE ssID = @SSID;

SELECT SSX.ssID, ssXml.query('declare namespace ns="http://ns.adobe.com/xfdf/"; (/ns:xfdf/ns:fields/ns:field[@name="Dis"]/ns:field[@name="D"]/ns:field[@name="09"]/ns:field[@name="02"]/ns:field[@name="F"]/ns:value/text())[1]') AS DisabilityXML
FROM SchoolSurveyXml_ SSX
INNER JOIN SchoolSurvey SS ON SSX.ssID = SS.ssID
WHERE SSX.ssID = @SSID;

ROLLBACK;

Then compare the before and after in Excel pivot tables for the expected new disability data. The results, less estimates in the warehouse and new data where none was available in 2023. Still there is a mixture of schools reporting over the years as observed by the remaining and changing estimates values but this is another thing.

2024-03-13T09:02:06+00:00

Ghislain Hachey reporter
Only remaining task is to execute this in production after a meeting with country data meet first.
- 2024-03-13T09:02:58+00:00

Brian Lewis repo owner

All great work Ghislain

This broke when switching to the upload in the portal (2022): in the desktop survey loader the code was validated against metaPupilTableDefs, not lkpPupilTableTypes

Only some small suggestions on the fix-up SQL: its easier to code and fast er to run to loop through records using this min(xx) > @variable.... structure as below.

Also you can move the ‘Dis’ to a sql variable and reference it in the xquery using the sql:variable() function - possibly makes this the basis for a stored proc that takes the type as a variable (if its ever needed again)

‌

declare @dis xml

declare @svy xml
declare @FieldTag nvarchar(10) = 'Dis'

--Select top 10 * from SchoolSurveyXml_ WHERE ssID in (Select ssID from SchoolSurvey WHERE svyYEar = 2023)
declare @ssID int

select @ssId = min(ssID) from SchoolSurveyXml_

while @ssID is not null begin

    Select @svy = ssXml from SchoolSurveyXml_ WHERE ssID = @ssID

    ;
    WITH XMLNAMESPACES 
    (  
        'http://ns.adobe.com/xfdf/' as x

    )  
    SELECT @dis = @svy.query('/x:xfdf/x:fields/x:field[@name=sql:variable("@FieldTag")]')
    SELECT @ssID ssID, @dis surveyNode
    begin transaction

        --exec dbo.xfdfGrid @ssID, @dis

        --Select * from PupilTables 
        --WHERE ssID = @ssID

    rollback
    -- advance to next record
    select @ssId = min(ssID) from SchoolSurveyXml_ WHERE ssID > @ssID

end

‌

2024-03-13T10:56:40+00:00

Ghislain Hachey reporter
Thanks Brian. I’ve adapted your idea of moving the ‘Dis’ to a sql variable and reference it in the xquery. But changing from cursor to min approach makes the otherwise same script run much longer. Perhaps something I do wrong. Anyway, Ill keep both as reference.
- 2024-03-27T06:30:36+00:00
Ghislain Hachey reporter
- changed status to resolved
Completed in production
- 2024-03-29T03:34:36+00:00
Log in to comment

Assignee: Ghislain Hachey

Type: bug

Priority: blocker

Status: resolved

Votes: 0

Watchers: 1