Criteria used to assess exam quality

Each review scores an exam against fifteen criteria across five areas, drawn from standard educational-measurement literature (Downing & Haladyna, Biggs' constructive alignment, Bloom's revised taxonomy) and from practical exam-committee checklists used across Dutch universities.

Constructive alignment

Questions map to the stated learning objectives.
Every item can be traced back to an intended learning outcome.
The exam covers the syllabus proportionally.
Weighting reflects the emphasis given in teaching.
Multiple cognitive levels are assessed (recall, apply, analyse, evaluate).
Avoids testing only at the lowest Bloom levels.

Item quality

Each question is unambiguous and uses clear language.
A competent student can understand what is being asked on first reading.
For closed items, distractors are plausible and exactly one answer is correct.
No 'all/none of the above' fillers, no giveaways.
Items are independent — answering one does not cue another.
Later items do not reveal answers to earlier ones.
The exam has an appropriate spread of easy, medium and hard items.
Avoids being trivial or systematically too hard.

Fairness & accessibility

Language complexity is appropriate and does not become a barrier.
Tests the construct, not reading ability (unless that is the construct).
The exam is free of cultural, gender, or other group bias.
Examples and contexts are inclusive.
Layout and format are accessible (readable font size, logical ordering).
Works for students using extra time, screen readers, or dyslexia fonts.

Practicality

The total time is sufficient for a well-prepared student to finish carefully.
Not a speed test unless that is intentional.
Instructions (marks per question, aids allowed, format) are complete and unambiguous.
A student knows exactly what to do without asking.
Marks per item are proportional to effort and importance.
Big-mark questions are clearly identifiable.

Marking & reliability

A model answer or marking rubric is provided.
Open questions have explicit criteria for full, partial, and zero marks.
The exam is likely to produce consistent scores across markers.
Two markers following the rubric would reach the same grade.
Tasks resemble real, discipline-relevant problems where appropriate.
Supports transfer of learning beyond the classroom.

Scoring

Each criterion is rated on a 5-point Likert scale (1 = Strongly disagree, 5 = Strongly agree) with an optional Not applicable option. Per-criterion comments are the most actionable part of the report and are strongly encouraged for any rating of 1 or 2.

The overall score shown on each report is the unweighted mean of the rated (non-N/A) items, mapped to a quality band:

Excellent mean ≥ 4.5
Good mean ≥ 3.5
Adequate mean ≥ 2.5
Needs improvement mean ≥ 1.5
Poor mean < 1.5