the e-Assessment Association

Assessing Numerical Reasoning On-Screen Using Hint Items

Assessing Numerical Reasoning On-Screen Using Hint Items

In Wales, all children aged 7 to 14 (years 2 – 9) take national, adaptive, personalised assessments on screen in procedural numeracy, reading and numerical reasoning. AlphaPlus is leading a multi-partner project team on behalf of Welsh Government to develop these assessments.

The final assessment, numerical reasoning, has just gone live. This case study describes how the numerical reasoning assessments work, how we have used “hints”, and what implications this has had on adaptivity.

Background to assessments

In Wales, all children aged 7 to 14 (years 2 – 9) take national assessments in procedural numeracy, reading and numerical reasoning. These assessments are designed to identify areas where children and teachers need to focus their efforts in order to make progress.

Welsh Government made the decision to move these previously linear, paper-based assessments online, taking advantage of the benefits of adaptive e-assessment. AlphaPlus is leading a multi-partner project team on behalf of Welsh Government to develop these assessments.

The first two assessments, procedural numeracy and reading (in both English and Welsh) have already gone live in schools, with over 950,000 assessments having been taken to date.

What is numerical reasoning?

There are two separate numeracy assessments as part of the Welsh adaptive personalised assessments – procedural numeracy and numerical reasoning. Procedural numeracy focuses on numerical facts and procedures – the numerical ‘tools’ that are needed to apply numeracy within a range of contexts. Learners are given various simple (in format, not in difficulty) maths questions (addition, multiplication, etc.) with one mark answers.

Numerical reasoning on the other hand can be thought of as “problem solving using maths” – learners use information to work out an answer. There are typically multiple possible methods to reach a correct answer and the learner is not given any direction on which approach to take.

Format of the assessment

The numerical reasoning assessment has three sections:

The first section is made up of one mark, ‘single step’ maths questions with a reasoning bent. Again the method may vary and is not given. This section is most similar in terms of format to the procedural numeracy assessment. Depending on the algorithm, learners typically see around 7 to 12 questions.

The second section involves two to four “hint” questions. These are multi mark questions (two to four marks) in which learners receive hints if they are struggling to reach the correct answer.

The third section comprises 5 or 6 stimulus based questions. These are numerical reasoning questions which are preceded by an on-screen stimulus. The stimulus comprises pictures and text with a linked audio file. (Use of audio files prevents the numeracy assessment from becoming a proxy reading test.) Learners can choose to listen to the stimulus again before moving on to the questions. Some of the stimulus based questions use hints.

How the hints work

Unlike the procedural numeracy assessment, the numerical reasoning assessment involves multi mark maths questions. This presented us with an issue.

In a paper assessment, if a learner gets the answer of a multi mark question correct, they receive full marks. Similarly, in the onscreen assessment, if a learner gets the answer correct on their first go, they will receive full marks.

However, in a multi mark maths question in a paper assessment, learners are asked to “show their working” on the assessment paper. If a learner gets the final answer wrong, the teacher will use their working to give method marks, so even if a learner makes a minor error which is carried forward, they can still get some marks.

However, this is not possible in an onscreen assessment. Although learners are given paper to help them work out answers, there is no way of entering this “working out” into the assessment system.

Hints help solve this issue. If a learner enters the incorrect answer, they receive a prompt to check their working. If they enter the correct answer after this prompt, they will still receive full marks.

If the learner still enters a wrong answer, they will receive a hint to help them reach the correct answer. Each hint removes a mark from the total number the learner can receive for that question (see diagram below). The hints are designed to progressively lower the difficulty of the item, allowing more potential for the learner to ‘break in to the question. There are up to three hints per question, and all learners receive the same hints for each question – they are not dependent on what answer the learner has entered. How learners use the hints in answering the question is included in assessment reports and can provide a useful indication to teachers of whether learners are able to make use of additional information when they are unsure, which is an important problem solving skill.


The use of hints in these questions has an impact on how the adaptivity of the assessment works.

For an adaptive assessment to function, the difficulty level of every item in a bank must be definitively known, a process done using item response theory.

To summarise, when learners answer an item, an algorithm calculates an interim ability estimate, and uses this to select an appropriate item to deliver next. For the one mark questions in the procedural numeracy, this is a simple process. A learner who answers a question correctly will receive a question that is slightly more challenging; a learner who answers a question incorrectly will receive a slightly easier question.

However, the adaptivity works slightly differently for the multi-mark hint questions. Essentially, each hint is treated by the algorithm as a separate item with its own IRT value, all grouped together into one question.


The hints are completely new to the school context. Some individual university courses have done something similar, but not at this scale or level.

Feedback from trialling has been mostly positive. There were some initial concerns from teachers that the numerical reasoning items might be too difficult for learners, but results from the trials show that learners have been rising to the challenge. Our work on the personalised adaptive assessments has shown the strongest learners in each year are able to do much harder questions than the weakest in the year above.

We have also had a lot of positive feedback on the use of audio in the third section of the assessment. The reading aloud of the stimulus narrative ensures that learners with lower reading abilities can still access it, allowing them to demonstrate their skills.

Share this: