-
assigned issue to
Missing Disability processing of Annual PDF Survey
Kiribati reported that the disability data is missing from the data warehouse. Upon further investigating I believe that the upload process of the Annual PDF Survey does not process the Disability table (i.e. missing stored procure [pEnrolmentWrite].[xfdfDisability]). Therefore, the data does not go in the data warehouse. However, the data does get loaded into the system in its raw form (the original PDF XML representation). The recommended course of action would be to:
- Fix the missing Disability processing feature
-
Reload the PDFs for the schools with submitted Disability data by hand one by one. Previous years had data but I believe they came from estimates from prior years. This leaves you with the decision to either:
- Keep the data as estimates for 2021 and 2022 (only reload submitted data for 2023)
- Replace the estimates with real submitted data for all last three years (2021, 2022, 2023). This will perhaps (likely) change your figures for those years.
-
We could also automate pushing all the submitted Disability data from the uploaded PDF survey into the data warehouse but this will require more time and work on the developers end.
Comments (8)
-
reporter -
reporter Upon further investigating. It seems that
- The Disability data is in the raw PDF XML stored in [dbo].[SchoolSurveyXml_]
- The stored procedure used for repeaters (i.e. [dbo].[xfdfGrid]) seems like it should work unchanged also for disability data provided DIsability is correctly configured in bdo.metaPupilTableDefs.
- The app backend does seem to miss calling this stored procedure with the part of the XML that contains the disability data (needs fixing)
Finally consider back porting disability data from raw for
- Maybe not 2021 (those are labelled as not estimate in warehouse)
- 2022 (labelled as estimate)
- 2023 (completely missing)
-
reporter It turns out, no fix needed on the backend. Simply add the missing data below in table dbo.lkpPupilTableTupes and it should be executed by the backend normally.
-
reporter Finally, extracting data for past years was done with the following slow script (a set based approach like using a cross join would be better for more records).
-- Slow performing script to run the stored proc on each past Disability XML partial -- (i.e. 244 records ran 3-4 minutes on i9 11th Gen laptop CPU) DECLARE @XmlData XML; DECLARE @ID INT; DECLARE @XmlCursor CURSOR; -- Declare a cursor to iterate over each row SET @XmlCursor = CURSOR FOR SELECT SSX.ssID, ssXml.query('declare namespace ns="http://ns.adobe.com/xfdf/"; (/ns:xfdf/ns:fields/ns:field[@name="Dis"])[1]') AS DisabilityXML FROM SchoolSurveyXml_ SSX INNER JOIN SchoolSurvey SS ON SSX.ssID = SS.ssID WHERE SS.svyYear IN (2021, 2022, 2023); -- Open the cursor OPEN @XmlCursor; -- Fetch the first row FETCH NEXT FROM @XmlCursor INTO @ID, @XmlData; -- Loop through each row WHILE @@FETCH_STATUS = 0 BEGIN -- Execute your stored procedure with the XML data and ID as parameters BEGIN TRY SELECT 'Processing dbo.xfdfGrid on', @ID, @XmlData; EXEC dbo.xfdfGrid @surveyID=@ID,@grid=@XmlData; END TRY BEGIN CATCH -- Handle the error as needed PRINT 'Error occurred for ID: ' + CAST(@ID AS VARCHAR(10)) + '. Error message: ' + ERROR_MESSAGE(); END CATCH -- Fetch the next row FETCH NEXT FROM @XmlCursor INTO @ID, @XmlData; END -- Close and deallocate the cursor CLOSE @XmlCursor; DEALLOCATE @XmlCursor;
There was one SchoolSurvey XML that threw an error because of bad data (i.e. an X where an int should be for a number of students). This was fixed by changing the bad value to 0.
-- Sample modification of a bad bit of data in a specific survey in a specific XML element -- (in this case there was an 'X' instead of an int for Disability number) BEGIN TRANSACTION; SELECT SSX.ssID, ssXml.query('declare namespace ns="http://ns.adobe.com/xfdf/"; (/ns:xfdf/ns:fields/ns:field[@name="Dis"]/ns:field[@name="D"]/ns:field[@name="09"]/ns:field[@name="02"]/ns:field[@name="F"]/ns:value/text())[1]') AS DisabilityXML FROM SchoolSurveyXml_ SSX INNER JOIN SchoolSurvey SS ON SSX.ssID = SS.ssID WHERE SSX.ssID = @SSID; UPDATE SchoolSurveyXml_ SET ssXml.modify(' declare namespace ns="http://ns.adobe.com/xfdf/"; replace value of (/ns:xfdf/ns:fields/ns:field[@name="Dis"]/ns:field[@name="D"]/ns:field[@name="09"]/ns:field[@name="02"]/ns:field[@name="F"]/ns:value/text())[1] with "0" ') WHERE ssID = @SSID; SELECT SSX.ssID, ssXml.query('declare namespace ns="http://ns.adobe.com/xfdf/"; (/ns:xfdf/ns:fields/ns:field[@name="Dis"]/ns:field[@name="D"]/ns:field[@name="09"]/ns:field[@name="02"]/ns:field[@name="F"]/ns:value/text())[1]') AS DisabilityXML FROM SchoolSurveyXml_ SSX INNER JOIN SchoolSurvey SS ON SSX.ssID = SS.ssID WHERE SSX.ssID = @SSID; ROLLBACK;
Then compare the before and after in Excel pivot tables for the expected new disability data. The results, less estimates in the warehouse and new data where none was available in 2023. Still there is a mixture of schools reporting over the years as observed by the remaining and changing estimates values but this is another thing.
-
reporter Only remaining task is to execute this in production after a meeting with country data meet first.
-
repo owner All great work Ghislain
This broke when switching to the upload in the portal (2022): in the desktop survey loader the code was validated against metaPupilTableDefs, not lkpPupilTableTypes
Only some small suggestions on the fix-up SQL: its easier to code and fast er to run to loop through records using this min(xx) > @variable.... structure as below.
Also you can move the ‘Dis’ to a sql variable and reference it in the xquery using the sql:variable() function - possibly makes this the basis for a stored proc that takes the type as a variable (if its ever needed again)
declare @dis xml declare @svy xml declare @FieldTag nvarchar(10) = 'Dis' --Select top 10 * from SchoolSurveyXml_ WHERE ssID in (Select ssID from SchoolSurvey WHERE svyYEar = 2023) declare @ssID int select @ssId = min(ssID) from SchoolSurveyXml_ while @ssID is not null begin Select @svy = ssXml from SchoolSurveyXml_ WHERE ssID = @ssID ; WITH XMLNAMESPACES ( 'http://ns.adobe.com/xfdf/' as x ) SELECT @dis = @svy.query('/x:xfdf/x:fields/x:field[@name=sql:variable("@FieldTag")]') SELECT @ssID ssID, @dis surveyNode begin transaction --exec dbo.xfdfGrid @ssID, @dis --Select * from PupilTables --WHERE ssID = @ssID rollback -- advance to next record select @ssId = min(ssID) from SchoolSurveyXml_ WHERE ssID > @ssID end
-
reporter Thanks Brian. I’ve adapted your idea of moving the ‘Dis’ to a sql variable and reference it in the xquery. But changing from cursor to min approach makes the otherwise same script run much longer. Perhaps something I do wrong. Anyway, Ill keep both as reference.
-
reporter - changed status to resolved
Completed in production
- Log in to comment