Authentic Assessment in the AI Age

Most educators reaching for “AI-proof assessment” in 2026 are looking for new ideas. They’re going to surveillance, return-to-paper exams, and proctoring tools. I’d argue the answer was already on the table twelve years ago. Ashford-Rowe, Herrington, and Brown’s (2014) paper on the critical elements of authentic assessment is the kind of pre-AI work that has aged better than almost any other framework I’ve engaged with in my reading on assessment design. The authors didn’t write it for the AI era. The AI era arrived and made the framework newly urgent.

Authentic Assessment in the AI Age

Ashford-Rowe et al. (2014) set out to do something practical. The literature on authentic assessment was rich but loose, and the term meant different things to different educators. The authors wanted a framework that instructional designers could actually apply, with clear elements anyone could ask of any assessment task. They went through the literature, talked to thirteen practitioners, consulted three experts, redesigned an Australian Army training module using the framework, and then evaluated it with the students who took the redesigned course.

What came out is a checklist of eight critical questions.

  • Does the assessment challenge the student?
  • Does it require a performance or product?
  • Does it require transfer of learning?
  • Does it require metacognition?
  • Could a real client recognise the outcome as authentic?
  • Is the environment and tooling true to the actual workplace?
  • Does it require discussion and feedback?
  • Does it require collaboration?
Authentic Assessment in the AI Age

The Framework Reads Like an AI Resistance Plan

That checklist is the part of the paper I find most useful. Read it with AI in mind, and almost every element makes the assessment harder for a chatbot to do for the student. A real performance demanded under real workplace conditions, with peer feedback, collaboration, and metacognitive reflection, isn’t easily faked. The student has to actually be there.

Ashford-Rowe et al. (2014) argue that “it is the responsibility of designers to determine the extent to which the assessment activity requires the production of a completed outcome or product” (p. 208). That responsibility shifts back to the educator, where it belongs. The decision about what counts as an acceptable demonstration of learning is a pedagogical decision, not a technical one. AI doesn’t change that. AI just makes the lazy versions of that decision more visible.

The metacognition element is the part that connects most directly to current AI debates. The authors emphasise that “the significance of metacognition to learning process is such that it stimulates deep learning” (p. 209). Strip metacognition out of an assessment and you’ve removed exactly the cognitive work AI tools are best at substituting for. Tai et al. (2023) make a similar point in their later work on assessment for inclusion, which I’ve covered before, drawing directly on this 2014 framework as one of their three recommended design approaches.

The Connection to Validity and Cheating

The authors also tie authenticity to assessment value in a way that lines up with more recent work I’ve engaged with. Dawson et al. (2024) argue that validity, not cheating, should drive assessment design. The authentic-assessment framework is a concrete way of doing that. If the assessment is anchored to a real task in a real environment with a real audience, the AI-use question dissolves into a bigger one: did the student demonstrate the capability? Bearman, Nieminen, and Ajjawi (2023) extended that argument in their more recent paper on designing assessment in a digital world, and the foundation for both lines of work is here, in 2014.

Where the Framework Shows Its Age

The empirical test is small, with only six interviewed students from one military training module, and the authors are clear about that. They recommend further validation across disciplines. The fidelity element, in particular, was identified by them as needing better contextualisation.

Twelve years later, the AI dimension creates new questions the framework doesn’t directly address. What does fidelity mean when the actual workplace now includes AI tools? Collaboration is even trickier, since one of the “collaborators” might now be a chatbot. The framework is open enough to absorb those questions, but it doesn’t answer them on its own.

I’d add another concern. The framework places enormous weight on the designer. Eight critical questions, each requiring real knowledge of the workplace context and real time to redesign. That kind of design work doesn’t happen on a teacher’s prep period. Without sustained institutional support and professional development, even a beautifully articulated framework collapses on contact with everyday classroom reality.

Anchoring assessment to real work demand a real outcome. It also demands building in reflection and letting the design carry the integrity load. That’s the argument I’ve been making about AI for years on the blog. Ashford-Rowe et al. (2014) were making it before any of us had heard of ChatGPT. They concluded that “authenticity, once deconstructed to determine its critical component elements, can present an effective model for task design and assessment” (p. 220). The AI era has only proved them right.

Twelve years of edtech change, a generative-AI boom, and an assessment crisis later, the eight questions still work. The framework is here for educators willing to apply it.

References

  • Ashford-Rowe, K., Herrington, J., & Brown, C. (2014). Establishing the critical elements that determine authentic assessment. Assessment & Evaluation in Higher Education, 39(2), 205-222. https://doi.org/10.1080/02602938.2013.819566
  • Bearman, M., Nieminen, J. H., & Ajjawi, R. (2023). Designing assessment in a digital world: An organising framework. _Assessment & Evaluation in Higher Education_, 48(3), 291-304. https://doi.org/10.1080/02602938.2022.2069674 .
  • Dawson, P., Bearman, M., Dollinger, M., & Boud, D. (2024). Validity matters more than cheating. _Assessment & Evaluation in Higher Education_, 49(7), 1005–1016. https://doi.org/10.1080/02602938.2024.2386662
  • Tai, J., Ajjawi, R., Bearman, M., Boud, D., Dawson, P., & Jorre de St Jorre, T. (2023). Assessment for inclusion: Rethinking contemporary strategies in assessment design. Higher Education Research & Development, 42(2), 483-497. https://doi.org/10.1080/07294360.2022.2057451.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top