Most assessment reform writing argues for abolishing tests. I’ve made similar arguments. But abolition has never been a realistic path for most universities, especially in countries where testing culture runs deep, and dismissing the test entirely usually means dismissing the only assessment millions of students will see.
Villarroel, Boud, Bloxham, Bruna, and Bruna’s (2020) paper on redesigning written tests using authentic assessment principles takes the harder, more useful position. They argue for keeping the test and rebuilding it from the inside. Six years on, with AI now breathing down every assessment designer’s neck, that position has aged into something approaching urgent.

The Three-Phase Framework They Propose for Authentic Assessment
Villarroel et al. (2020) build on their earlier 2018 systematic review, which identified three core dimensions of authentic assessment: realism, cognitive challenge, and authentic evaluative judgement. This paper applies all three across the full life of a test.
In the planning phase, content gets anchored to the graduate profile and learning outcomes, realism gets injected into the items, and the cognitive demand shifts from memory to transfer. The administering phase brings open books, collaborative answers on complex tasks, and conditions that resemble the actual workplace. Follow-up turns into something different too, with students co-constructing marking criteria, peer-reviewing each other’s anonymised tests, and self-assessing their answers in dialogue with the teacher.
This connects naturally with Ashford-Rowe, Herrington, and Brown’s (2014) framework on the critical elements of authentic assessment, which I’ve covered before. Villarroel et al. (2020) take the same conceptual foundation and apply it specifically to the format that dominates higher education globally: the timed, written test.

The Cost the Authors Acknowledge
What I find most useful about the paper is that the authors don’t oversell the redesign. They acknowledge the real cost: questions take longer to answer, harder to grade, and students raised on memorisation tests will struggle in the short term.
There’s another concern I’d add. None of the redesigned strategies in their three-phase framework land without teachers who have time to learn them. Co-constructing marking criteria with students, designing items where the context actually matters, running peer review on anonymised tests: every one of these moves takes preparation, calibration, and institutional support. Bearman et al. (2024) made a similar point in their work on evaluative judgement for the AI era, which I’ve covered before. The framework is good. The classroom reality determines whether it lands.
The AI Context the Paper Doesn’t Cover
The paper was published in 2020, with much of the writing done in 2018 and 2019. AI doesn’t appear in it. The framework speaks to the AI moment in ways the authors didn’t anticipate when they wrote it. The closed-book exam they critique on cognitive grounds is the same exam many institutions are now reaching for as an anti-AI defence.
That defence solves the wrong problem. A closed-book exam that tests only memorisation is an exam where AI couldn’t help even if students wanted it to. The exam doesn’t measure capability. AI didn’t make that problem worse. AI just made it visible. Dawson et al. (2024) argue in a related paper I’ve covered that validity, not cheating, should drive assessment design, and Villarroel et al.’s redesign moves in exactly that direction.
The paper is conceptual, not empirical. The authors are upfront about that. They illustrate the principles with sample items but don’t run a controlled comparison of redesigned tests versus traditional ones. That’s a real limit, and the field still needs that work.
Even so, this is a practical piece. It treats teachers as people who have to operate in real institutions, with real testing cultures, on real timelines. It doesn’t ask them to abolish anything. It asks them to redesign what they already do.
References
- Ashford-Rowe, K., Herrington, J., & Brown, C. (2014). Establishing the critical elements that determine authentic assessment. Assessment & Evaluation in Higher Education, 39(2), 205–222. https://doi-org.ezproxy.msvu.ca/10.1080/02602938.2013.819566
- Bearman, M., Nieminen, J. H., & Ajjawi, R. (2023). Designing assessment in a digital world: An organising framework. Assessment & Evaluation in Higher Education, 48(3), 291-304. https://doi.org/10.1080/02602938.2022.2069674
- Dawson, P., Bearman, M., Dollinger, M., & Boud, D. (2024). Validity matters more than cheating. Assessment & Evaluation in Higher Education, 49(7), 1005–1016. https://doi.org/10.1080/02602938.2024.2386662
- Villarroel, V., Boud, D., Bloxham, S., Bruna, D., & Bruna, C. (2020). Using principles of authentic assessment to redesign written examinations and tests. Innovations in Education and Teaching International, 57(1), 38-49. https://doi.org/10.1080/14703297.2018.1564882
