I’ve been arguing for a while now that educators need to embrace AI widely, aggressively, and unapologetically. And I’ll keep saying it. But I also think we need to be honest about the hardest question AI raises in education: what do we do about assessment?
Perkins and Roe (2025) tackle this head-on in their chapter “The End of Assessment as We Know It.” Their argument is provocative but grounded: most of our current methods for validating learning are already vulnerable to generative AI, and retreating to exam halls won’t fix the problem. As they put it, “most of our current methods of validating learning will soon be GenAI-vulnerable” (p. 77).
I agree. And I’d push it further. The vulnerability they’re describing isn’t new. GenAI just made it impossible to ignore.
The Future of AI and Assessment: Why Detection Is a Dead End
Let’s be honest about AI detection. It doesn’t work reliably, it creates adversarial dynamics between teachers and students, and it distracts from the more important conversation about what we’re actually trying to assess. Perkins and Roe have argued in their AI Assessment Scale work (2024) that detection is “impossible and inefficient.” Nothing has changed to prove them wrong.
Some institutions are responding by running back to proctored exams and viva voce formats. I get the impulse. When the ground shifts under your feet, you reach for something solid. But Perkins and Roe caution against treating the exam hall as a safe haven. Wearable AI devices and emerging neural interface technologies are already complicating controlled environments. A physical room doesn’t guarantee independence of thought. It never really did.

What I think is actually happening: GenAI is exposing a fragility that was always there. We’ve been equating output with understanding for decades. A well-written essay got a good grade because we assumed the writing reflected the thinking. Shaky assumption. Students have always found workarounds, from essay mills to collaborative cheating. GenAI just made the workaround seamless, instant, and free.
So the question isn’t how do we stop students from using AI. It’s this: what do we actually want to know about what our students can do? And can we design assessments that answer that question honestly?
The Equity Problem We Can’t Afford to Ignore
Perkins and Roe make a move in this chapter that I think is critical. They refuse the simple Global North/South binary and focus on digital infrastructure: bandwidth, hardware, training capacity, and AI literacy. The divide isn’t geographic. It’s structural.
One statistic is key here: “Ninety-nine per cent of the world’s languages lack the data required to train state-of-the-art GenAI models” (p. 78). Let that sink in. We’re redesigning assessment for an AI-integrated world, and 99% of the world’s languages aren’t even represented in the systems we’re building around.
The idea of AI as a universal equalizer? Perkins and Roe call it “more myth than reality” (p. 78). I’ve written about this equity dimension in my post on Critical AI Literacy, where Roe, Furze, and Perkins (2025) argued that any framework for AI in education must address power, bias, and access. The assessment conversation needs that same lens.
And here’s the paradox. Well-resourced institutions are already experimenting with ungrading, collaborative problem-solving, and AI-integrated assessments. They’re moving forward. Meanwhile, under-resourced institutions may respond defensively, doubling down on closed-book exams and rote memorization. GenAI could propel some systems toward progressive pedagogy and freeze others in nineteenth-century assessment patterns. The gap widens.
What We Should Be Assessing in an AI World
I’ve been saying in workshops and in my writing that AI changes what assessment should look like. If recall, analytical writing, and structured exposition can be partially automated, then we need to ask: what human capacities are we actually trying to develop?
Perkins and Roe point toward ethical reasoning, relational expertise, and professional judgment. I’d add critical thinking, the ability to question AI outputs, and metacognitive awareness, the ability to reflect on your own learning process. Shaw and Nave (2026) showed in their cognitive surrender research that students often accept AI-generated text without critically processing it. If we’re not assessing students’ ability to push back on AI, to interrogate it, to recognize when it’s confidently wrong, we’re assessing the wrong things.
Perkins and Roe put it well: “a truly inclusive future demands that we critically reconsider not only our assessment methods but also the types of knowledge that we prioritize and validate” (2025, p. 79).
Yes. And I think it applies at every level, from K-12 to graduate school. The assessments we design tell students what we value. If we value critical engagement with AI, creative problem-solving, and independent thought, our assessments need to reflect that.
A student who can use AI strategically to research a complex problem, identify where the AI got it wrong, synthesize multiple sources, and arrive at an original argument, that student has demonstrated something genuinely valuable. Our assessment formats should recognize and reward that kind of work.
The AI Assessment Scale developed by Perkins, Roe, and Furze (2024) is one practical tool for getting there. Five levels of AI integration, each aligned with specific learning goals. Level 1 requires no AI in supervised conditions. Level 5 invites co-creation and experimentation. The point is intentionality. You choose the level based on what you’re assessing, and you tell students clearly.
Moving Forward With Urgency
Perkins and Roe close with a call I fully support: “Institutions and policy-makers must take deliberate steps to redistribute digital resources and develop multilingual GenAI models to ensure that future assessment practices are inclusive and equitable” (p. 79).
I’d add something. We also need courage. The future of AI and assessment requires educators willing to step outside familiar models, try new formats, and accept that some experiments will fail. The worst response is paralysis, clinging to practices that were already imperfect and pretending AI didn’t happen.
Assessment has always been imperfect. GenAI makes the imperfections visible. Visibility is where better design starts.
Reference
Perkins, M., & Roe, J. (2025). The end of assessment as we know it: GenAI, inequality and the future of knowing. In AI and the future of education: Disruptions, dilemmas and directions (pp. 76–80). https://durham-repository.worktribe.com/output/4472558/the-end-of-assessment-as-we-know-it-genai-inequality-and-the-future-of-knowing
Perkins, M., Roe, J., & Furze, L. (2024). The AI Assessment Scale revisited: A framework for educational assessment (Preprint). December 2024. https://arxiv.org/abs/2412.09029
Roe, J., Furze, L., & Perkins, M. (2025). Digital plastic: A metaphorical framework for Critical AI Literacy in the multiliteracies era. Pedagogies: An International Journal. Advance online publication. https://doi.org/10.1080/1554480X.2025.2557491
