Using the data from e-assessment systems
There’s a lot of work involved in getting an e-assessment service up and running live examinations. Aside from selecting and deploying the technology, there are challenges in administrative arrangements and managing organisational change, ensuring all candidates have fair access to assessment and converting assessment questions to on-screen forms. So while this is going on it’s to be expected that existing processes are retained wherever possible – after all, moving to e-assessment presents sufficient challenges for maintaining assessment standards without changing things that don’t need to be changed.
Having spent the earlier part of the last 10 years helping organisations to deploy e-assessment, and the latter part undertaking performance analysis of assessment systems, I’m sympathetic to the “don’t change everything at once” approach! However, once e-assessment is running smoothly, I do think we have a duty to use the new and detailed information that our systems generate to improve the quality of our assessments. I’m still surprised, for example, to find from time to time that candidate performance data on items is not used to update basic item statistics (facility and discrimination typically), or that these statistics are not used when creating tests alongside content-based selection rules. On similar lines, I think relatively few organisations plan a regular (e.g. every couple of years) review of their assessment data to check how candidates with different demographics are getting on, or whether variant assessments (for example, extra-time or non-audio versions for candidates with special access requirements) are being used consistently and are having the desired effect of improving the fairness of assessment.
None of this is to say that I think problems ‘hidden in the data’ are widespread – where we have been asked to support analysis – we’ve commonly seen systems where “custom-and-practice” based approaches have delivered decent quality, but there’s often room for improvements to be made. My point is that monitoring the data, making the improvements where necessary and generally making good use of information where it’s available is something the public expects us to do.
I think Ofqual expects us to do it too. Recent work on reliability in qualifications led to recommendations to Ofqual that they require awarding organisations to publish information about standard setting practices and related matters such as the reliability of assessments, and the revised regulatory framework increases the pressure on awarding bodies to generate and store evidence that their assessments are working as they intend. Ofqual are now extending their work on reliability to encompass the wider and more nebulous requirement of qualification validity.
For some reason, psychometricians are not as widespread in assessment in the UK as in the USA and elsewhere, despite us having a strong history in psychological measurement (personality tests, etc.) from which many of these statistical techniques evolved. Many awarding organisations don’t have easy access to such expertise, and even where they do, sometimes it’s hard to decide how far to go with analysis. Digging into data looking for hidden problems is never going to be high up management’s agenda!
As closing points, I would like to say a couple of things on this by way of urging assessment providers to think a little differently:
- Although there are some really esoteric statistical approaches to assessment performance management, the basics are really not too difficult, and in many cases, e-assessment systems produce such data as part of their management reporting, i.e. they do the hard maths for you. Learning to interpret this information, and using it to create better assessments are not especially time consuming and can provide confidence about quality of assessment where otherwise there may be unspoken doubts.
- Undertaking this type of work often enables other improvements – for example, statistical consideration of pass-mark/standard setting procedures can lead to very useful discussions about the implications of mis-classification (passing people who should fail, and vice-versa)
As I said at the outset, where good quality management information exists about our assessments, we have a duty to use it, and the evidence from such analysis often provides us with strong evidence about our candidate community, their teachers, and the purposes that our qualifications are put to.
Author: John Winkley, AlphaPlus Consultancy Ltd
 http://www.ofqual.gov.uk/standards/reliability/. In fairness, I should confess I was involved in some of this work!
 Reliability here means the extent to which assessments are fair and comparable from one instance to the next – i.e. would the same candidate get the same result if they took the test more than once. In the case of examinations this involves issues such as inter-marker consistency and the difficulty and variability between test instances where multiple tests exist (for example, in on-demand testing).
 Broadly, the fitness of a qualification for a particular purpose, e.g. selection for progression to subsequent education, assertion of sufficient skill for a job. Validity is concerned with issues such as ‘does the assessment test enough of the curriculum, and in a balanced way’, ‘is the assessment an authentic and credible test of what is being assessed’ ‘does the pass mark or grade boundaries discriminate between those above and below in a meaningful and fair way (validity encompasses reliability), etc.