Criteria used to assess exam quality
Each review scores an exam against fifteen criteria across five areas, drawn from standard educational-measurement literature (Downing & Haladyna, Biggs' constructive alignment, Bloom's revised taxonomy) and from practical exam-committee checklists used across Dutch universities.
Constructive alignment
-
Questions map to the stated learning objectives.
Every item can be traced back to an intended learning outcome.
-
The exam covers the syllabus proportionally.
Weighting reflects the emphasis given in teaching.
-
Multiple cognitive levels are assessed (recall, apply, analyse, evaluate).
Avoids testing only at the lowest Bloom levels.
Item quality
-
Each question is unambiguous and uses clear language.
A competent student can understand what is being asked on first reading.
-
For closed items, distractors are plausible and exactly one answer is correct.
No 'all/none of the above' fillers, no giveaways.
-
Items are independent — answering one does not cue another.
Later items do not reveal answers to earlier ones.
-
The exam has an appropriate spread of easy, medium and hard items.
Avoids being trivial or systematically too hard.
Fairness & accessibility
-
Language complexity is appropriate and does not become a barrier.
Tests the construct, not reading ability (unless that is the construct).
-
The exam is free of cultural, gender, or other group bias.
Examples and contexts are inclusive.
-
Layout and format are accessible (readable font size, logical ordering).
Works for students using extra time, screen readers, or dyslexia fonts.
Practicality
-
The total time is sufficient for a well-prepared student to finish carefully.
Not a speed test unless that is intentional.
-
Instructions (marks per question, aids allowed, format) are complete and unambiguous.
A student knows exactly what to do without asking.
-
Marks per item are proportional to effort and importance.
Big-mark questions are clearly identifiable.
Marking & reliability
-
A model answer or marking rubric is provided.
Open questions have explicit criteria for full, partial, and zero marks.
-
The exam is likely to produce consistent scores across markers.
Two markers following the rubric would reach the same grade.
-
Tasks resemble real, discipline-relevant problems where appropriate.
Supports transfer of learning beyond the classroom.
Scoring
Each criterion is rated on a 5-point Likert scale (1 = Strongly disagree, 5 = Strongly agree) with an optional Not applicable option. Per-criterion comments are the most actionable part of the report and are strongly encouraged for any rating of 1 or 2.
The overall score shown on each report is the unweighted mean of the rated (non-N/A) items, mapped to a quality band:
- Excellent mean ≥ 4.5
- Good mean ≥ 3.5
- Adequate mean ≥ 2.5
- Needs improvement mean ≥ 1.5
- Poor mean < 1.5