Finalize Exams Module [2021:HPR]

Issue #1086 resolved
Ghislain Hachey created an issue

This issue is to make sure that the exams module is finalized and ready for use.

Some related issues:

Need to discuss the differences between how SOE does analysis on benchmarks and standards and agree on the best way forward.

Comments (6)

  1. Brian Lewis repo owner

    The key part of new capability is the ability to load the same Excel workbook that is fed into Soe. We then need to be able to match Soe calculation logic, and … can we even produce the Excel output books by school?

  2. Brian Lewis repo owner

    Notes on Soe:

    This article describes the analysis of exams data performed in Pacific EMIS, and relates this to the concepts and techniques used in the Soe examination system.

    Standard Test of Achievement

    A Standard Test of Achievement is an examination that measures student competencies according to a hierarchy of:
    Standard
    Benchmark
    Indicator.

    That is, each Standard is comprised of a number of Benchmarks.
    Each Benchmarks is comprised of a number of Indicators.

    Items

    In the Soe model, each Indicator is represented by a set of Items. An item is a single question on the examination paper. Each item contributes to exactly 1 Indicator.
    A Candidate's reponse to an item is either wholly correct ( ie score = 1) ir wholly incorrect (Score=0)

    Indicator Achievement Level : The ItemCount method.

    Each candidate is assigned an Achievement Level for each Indicator. The Achievement Level is (usually) a value in the range (1,2,3,4). Each value is associated to a Level Description;
    e.g. Beginning, Developing, Proficient, Advanced.

    A candidate's Achievement Level for an Indicator is calculated by calculting the ratio of correct item responses / total number of items, then banding and scaling this ratio to give the Achievement Level value 1-4. This calculation we'll call the 'ItemCount' method.

    The ItemCount method is used in Soe to calculate the Indicator Level across all indicators and candidates. In Soe, the input data is an Excel workbook containing a list of the item responses for each candidate. Soe tests whether the response is correct (1) or incorrect (0). The ItemCount method is applied to these scored responses. The resulting Candidate/Indicator Achievement Levels are the base data from which all analysis in Soe is built.

    In Pacific EMIS, the input data is the same Excel workbook consumed by Soe. Pacific EMIS organises this data into tables in the SQL Server database. Specifically:
    ExamCandidates table contains a row for each candidate in the exam.
    ExamCandidateItems contains a row for each item and candidate. This holds the raw response entered by the candidate (e.g. 'A', 'B', 'C', 'D'...) and the resulting score for that item (1 or 0).
    Exam CandidateResults holds a record for for each Candidate and Indicator, showing the Achievement Level. The Achievement Level is calculated using the ItemCount method.
    This provides Pacific EMIS with exactly the raw data as used by Soe, so that all the analysis available in Soe is available in Pacific EMIS.

    School/Indicator Analysis: the LevelCount method.

    One main output produced by Soe is analysis of results by school. At the level of school, for a given indicator, we are interested to see the number of students who achieve each Achievement Level for that indicator. This results in a set of 4 numbers: No of Level 1 results, No of level 2 results, No of Level 3 results, No of Level 4 results. We'll refer to this array of numbers as (L1,L2,L3,L4). In Soe, these number are shown in tables, and drawn as percentage column charts. Typically, Levels 3 and 4 are shown positive (above 0), while Levels 1 and 2 are shown negative ( below 0).

    School/Benchmark and School/Standard Analysis

    In Soe, school results by Benchmark Standard are also calculated by the LevelCount method applied to all Indicators in the Benchmark. Adding the 4 results (L1, L2, L3, L4) gives a total equal to (the number of candidates * the number of indicators in the benchmark). For that reason, Soe is careful to identify that these scores do NOT represent student counts: in Soe the table displaying these numbers is headed 'Level Counts (NOT Students) for All Students for Each Rubric Level of Benchmarks', while the chart axis on which they are plotted is labelled 'Percent of Indicators for each Performance Level'.

    The same LevelCount method is used to calculate the (L1,L2,L3,L4) values by Standard. So, the same observations re student count vs Level count apply.

    Other Benchmark / Standard Analysis

    Pacific EMIS supports these LevelCount calculations for School Level Benchmark and Standard. In addition Pacific EMIS can support such LevelCount aggregations based on any characteristic of the school (public/private, state) or candidate (ethnicity, gender, SpED). Many options are interactively exposed through the Exams Dashboard.

    Candidate Benchmark/Standard Achievement Level

    In the earlier examination model, Indicator and Items are not available in the input (XML) data. Instead, an Achievement Level is captured at the Benchmark level.
    Can we construct a Candidate/Benchmark Achievement Level from the Soe structure, and if so, how?

    The obvious approach to this issue is to use the ItemCount method exactly as we do to calculate the Candidate/Indicator Level. Specifically, we take the sum across all Items that contribute to all Indicators in the Benchmark, and convert the ratio of Correct Items / Total Items back to an achievement Level 1-4.
    Pacific EMIS implements this calculation for both Benchmark and Standard. These are stoed in the ExamCandidateResults table.

    Candidate Benchmark/Standard: the Weighted method.

    Soe does not make this calculation, and in fact does not attempt to report on candidate/benchmark levels in this way. However, Soe does provide some calculations that suggest anther way to approach this issue.

    We'll call this approach the 'Weighted' method.

    Using the Weighted method, the Candidate Benchmark result is not a single level, but is an (L1,L2,L3,L4) array of 4 numbers, with the total equalling 1.
    To calculate, begin with the Level Count across all Indicators in the Benchmark to get an (L1,L2,L3,L4) array. Then divide each value by the total number of indicators in the benchmark. the total L1 + L2 + L3 + L4 is now 1.

    The same technique can be applied to Standard. Soe shows these weighted calculations for each Benchmark and Standard onthe AggregateScores page of the output workbook.

    Pacific EMIS stores the Weighted Benchmark and Standard totals in ExamCandidateResultsW table.

    Aggregate Benchmarks and Standards using Levount and Weighted values.

    Above we have seen that Pacific EMIS calculates and stores 2 versions of Candidate Benchmark:

    • calculated with the ItemCount method, and stored on ExamCandidateResults
    • calculated with the Weighted method, and stored on examCandidateResultsW

    With these two precalculated versions Candidate Benchmark available we can calculate aggregate benchmark scores:

    1. using the single Benchmark value per candidate held on ExamCandidateResults, produce the (L1,L2,L3,L4) values by accumulating the benchmark value into these buckets
    2. using the weighted (L1,L2,L3,L4) Benchmark value per candidate held on ExamCandidateResultsW, produce the (L1,L2,L3,L4) values by simple accumulation of the percentage values.

    However, we note that the second method - summing of Weighted scores, gives effectively the same output as LevelCount aggregation. The difference is that when we produce the (L1,L2,L3,L4) array for a benchmark across a set of candidates using LevelCount, the total L1 + L2 + L3 + L4 = (number of indicators * number of candidates). The Candidate Weighted values are already divided by the number of indicators, so that the sum of these weighted values = 1 for each candidate. When these weighted values are summed up across the set of candidates, the total L1 + L2 + L3 + L4 = the number of candidates.
    Whether the (L1,L2,L3,L4) array is calculated using the LevelCount method or the Weighted method, then the (L1,L2,L3,L4) values are reduced to percentages, they are the same.

    Thi is a pivot table of LevelCounts by Benchmark (exam A03)

    This is the matching Soe output:

  3. Ghislain Hachey reporter

    All this is also my understanding. And just to confirm I arrive at the same result as you. That the "weighted method" (i.e. summing the array of percentages for each achievement levels...) is equal to SOE's counting of the benchmarks' (or standards' or whole test's) respective indicators' achievement levels. The ItemCount method though arrives at similar but different results and means a different thing. The ItemCount will arrive at the same as the other two when there is a single indicator in a benchmark (but almost never when comparing standards and whole test).

    Just to give an idea of the scale of number of indicators that do not have a multiple of 4 items if I run the check on all the exams data in RMI these are the only ones that came up. So not huge.

    It seems @Brian Lewis and I arrive at the same result for these edge cases while the legacy SOE workbook seems to classify those in slightly different achievement levels. Below shows scale of these differences by printing out the various possible cases.

    All this said, I don't think we need to bother with these differences as they will rarely show up and when they do, the differences will be very minor.

    Shows for each total number of items correct what level would be achieved
    =========================================================================
    
    Bins with 1 Item
    -----------------
    
    'Equal width bins'
    
    (0    Beginning
     1     Advanced
     dtype: category
     Categories (4, object): ['Beginning' < 'Developing' < 'Proficient' < 'Advanced'],
     array([-0.001,  0.25 ,  0.5  ,  0.75 ,  1.   ]))
    
    'SOE equivalent bins'
    
    (0    Developing
     1      Advanced
     dtype: category
     Categories (4, object): ['Beginning' < 'Developing' < 'Proficient' < 'Advanced'],
     array([-0.002, -0.001,  0.5  ,  0.75 ,  1.   ]))
    
    
    Bins with 2 Items
    -----------------
    
    'Equal width bins'
    
    (0     Beginning
     1    Developing
     2      Advanced
     dtype: category
     Categories (4, object): ['Beginning' < 'Developing' < 'Proficient' < 'Advanced'],
     array([-0.002,  0.5  ,  1.   ,  1.5  ,  2.   ]))
    
    'SOE equivalent bins'
    
    (0    Developing
     1    Proficient
     2      Advanced
     dtype: category
     Categories (4, object): ['Beginning' < 'Developing' < 'Proficient' < 'Advanced'],
     array([-2.00e-03, -1.00e-03,  9.99e-01,  1.50e+00,  2.00e+00]))
    
    
    Bins with 3 Items
    -----------------
    
    'Equal width bins'
    
    (0     Beginning
     1    Developing
     2    Proficient
     3      Advanced
     dtype: category
     Categories (4, object): ['Beginning' < 'Developing' < 'Proficient' < 'Advanced'],
     array([-0.003,  0.75 ,  1.5  ,  2.25 ,  3.   ]))
    
    'SOE equivalent bins'
    
    (0     Beginning
     1    Developing
     2    Proficient
     3      Advanced
     dtype: category
     Categories (4, object): ['Beginning' < 'Developing' < 'Proficient' < 'Advanced'],
     array([-0.003,  0.75 ,  1.5  ,  2.25 ,  3.   ]))
    
    
    Bins with 4 Items
    -----------------
    
    'Equal width bins'
    
    (0     Beginning
     1     Beginning
     2    Developing
     3    Proficient
     4      Advanced
     dtype: category
     Categories (4, object): ['Beginning' < 'Developing' < 'Proficient' < 'Advanced'],
     array([-0.004,  1.   ,  2.   ,  3.   ,  4.   ]))
    
    
    Bins with 5 Items
    -----------------
    
    'Equal width bins'
    
    (0     Beginning
     1     Beginning
     2    Developing
     3    Proficient
     4      Advanced
     5      Advanced
     dtype: category
     Categories (4, object): ['Beginning' < 'Developing' < 'Proficient' < 'Advanced'],
     array([-0.005,  1.25 ,  2.5  ,  3.75 ,  5.   ]))
    
    'SOE equivalent bins'
    
    (0     Beginning
     1     Beginning
     2    Developing
     3    Developing
     4    Proficient
     5      Advanced
     dtype: category
     Categories (4, object): ['Beginning' < 'Developing' < 'Proficient' < 'Advanced'],
     array([-0.005,  1.25 ,  3.5  ,  4.75 ,  5.   ]))
    
    
    Bins with 8 Items
    -----------------
    
    'Equal width bins'
    
    (0     Beginning
     1     Beginning
     2     Beginning
     3    Developing
     4    Developing
     5    Proficient
     6    Proficient
     7      Advanced
     8      Advanced
     dtype: category
     Categories (4, object): ['Beginning' < 'Developing' < 'Proficient' < 'Advanced'],
     array([-0.008,  2.   ,  4.   ,  6.   ,  8.   ]))
    
    
    Bins with 9 Items
    -----------------
    
    'Equal width bins'
    
    (0     Beginning
     1     Beginning
     2     Beginning
     3    Developing
     4    Developing
     5    Proficient
     6    Proficient
     7      Advanced
     8      Advanced
     9      Advanced
     dtype: category
     Categories (4, object): ['Beginning' < 'Developing' < 'Proficient' < 'Advanced'],
     array([-0.009,  2.25 ,  4.5  ,  6.75 ,  9.   ]))
    
    'SOE equivalent bins'
    
    (0     Beginning
     1     Beginning
     2     Beginning
     3     Beginning
     4     Beginning
     5    Developing
     6    Developing
     7    Proficient
     8      Advanced
     9      Advanced
     dtype: category
     Categories (4, object): ['Beginning' < 'Developing' < 'Proficient' < 'Advanced'],
     array([-9.00e-03,  4.25e+00,  6.50e+00,  7.75e+00,  9.00e+00]))
    
    
    Bins with 11 Items
    -----------------
    
    'Equal width bins'
    
    (0      Beginning
     1      Beginning
     2      Beginning
     3     Developing
     4     Developing
     5     Developing
     6     Proficient
     7     Proficient
     8     Proficient
     9       Advanced
     10      Advanced
     11      Advanced
     dtype: category
     Categories (4, object): ['Beginning' < 'Developing' < 'Proficient' < 'Advanced'],
     array([-1.10e-02,  2.75e+00,  5.50e+00,  8.25e+00,  1.10e+01]))
    
    'SOE equivalent bins'
    
    (0      Beginning
     1      Beginning
     2      Beginning
     3      Beginning
     4      Beginning
     5      Beginning
     6     Developing
     7     Developing
     8     Proficient
     9     Proficient
     10      Advanced
     11      Advanced
     dtype: category
     Categories (4, object): ['Beginning' < 'Developing' < 'Proficient' < 'Advanced'],
     array([-1.10e-02,  5.75e+00,  7.50e+00,  9.25e+00,  1.10e+01]))
    
    
    Bins with 12 Items
    -----------------
    
    'Equal width bins'
    
    (0      Beginning
     1      Beginning
     2      Beginning
     3      Beginning
     4     Developing
     5     Developing
     6     Developing
     7     Proficient
     8     Proficient
     9     Proficient
     10      Advanced
     11      Advanced
     12      Advanced
     dtype: category
     Categories (4, object): ['Beginning' < 'Developing' < 'Proficient' < 'Advanced'],
     array([-0.012,  3.   ,  6.   ,  9.   , 12.   ]))
    

  4. Log in to comment