The biggest threat AI poses to learning isn’t cheating. It’s the slow, almost invisible drift in how students decide what’s any good. Bearman, Tai, Dawson, Boud, and Ajjawi (2024) make that case in their paper on developing evaluative judgement for the generative AI era, and I think the paper gets the question right in a way most of the assessment-and-AI literature has missed. They argue that “AI has widened the gap between our capability to produce work, and our capability to evaluate the quality of that work” (p. 903). That sentence is the whole paper in one line.
What Evaluative Judgement Actually Is
Bearman et al. (2024) define evaluative judgement as the ability to judge the quality of work, your own and others’. The capability has two parts. First, you need an internal sense of what counts as quality in your discipline. Second, you need to apply that sense to specific pieces of work, not just rules in the abstract.
Without both parts, AI outputs slip past students unchallenged. The bot writes a confident paragraph. The student reads it. The student doesn’t know what’s missing because the student doesn’t yet know what good work in this domain looks like.
The framing aligns with what Bearman, Nieminen, and Ajjawi (2023) argued earlier in their paper on designing assessment in a digital world, which I’ve covered before. Pedagogy has to teach the judging, not just the producing. AI handles the producing now.

The Three-Fold Framework Bearman et al. Propose
The core contribution is a framework with three points of intersection between AI and evaluative judgement. The first is the obvious one: students need to learn to judge AI outputs. Students also need to judge their own AI processes, including how they prompt, when they iterate, and when they should disengage entirely on ethical grounds. The third move surprised me. AI itself can be used to assess students’ evaluative judgements. A student tries to judge a piece of work, then asks AI to judge the same piece, then compares.
What makes that third move work is something Bearman et al. (2024) flag as a feature, not a bug. Students can disagree with AI in ways they can’t disagree with a teacher. The authority dynamic is gone. The student can argue with a chatbot, ignore its judgement, or test it against credible sources, all without the social cost of contradicting a professor. That gives evaluative judgement a kind of low-stakes practice space the classroom rarely offers.

What I Find Most Useful
The most practical part of the paper is Table 1, which redesigns five familiar assessment strategies for the AI era: self-assessment, peer assessment, feedback, rubrics, and exemplars. Each one gets reworked across all three intersections. Peer review, for example, becomes a setting where peers act as “humans in the loop” for AI judgements about each other’s work. That extends the long line of work this team has done on feedback design, including Dawson et al.’s (2024) argument that validity, not cheating, should drive assessment design.
The argument also lands harder when read alongside Fan et al.’s (2025) work on metacognitive laziness, which I’ve covered before. Students using AI feedback in that study didn’t gain new knowledge or transfer skills, even though their immediate work improved.
Bearman et al. (2024) are pointing at the same hole from a different angle. The cognitive work AI offloads from the student is exactly the work that produces evaluative judgement. Teach students to write with AI without teaching them to judge what AI gives them, and you’ve built the lazy version of pedagogy.
Where I’d Extend the Argument
The paper is conceptual. The authors are upfront about that. They synthesise their own previous work and the broader literature, but they don’t test the framework with students. That’s a real limit. Some of the proposals, especially using AI to assess students’ evaluative judgements, need empirical work before educators take them into a real classroom. Two years on, that empirical work is still mostly missing.
I’d add another concern. Bearman et al. (2024) emphasise that “in an age of AI not paying attention to evaluative judgement carries the risk that learners may start to adopt the understanding of quality that AI has inferred from its data, programmers or owners” (p. 903).
I think the paper underplays this risk. If students learn to judge quality by triangulating against AI outputs, the implicit notion of quality drifts toward whatever the corpus considers default. That’s a real shift in what counts as good work in a discipline.
There’s also the practical question. None of the redesigned strategies in Table 1 land without teachers who have time to learn them. Programme-level redesign, peer-review communities, iterative AI calibration, all of that demands institutional support that most universities aren’t providing.
What the paper gets right is the central reframe. The AI question for educators isn’t “did the student use AI?” The question is “can the student tell whether the output is any good?” That’s a pedagogical question, and pedagogy is what teachers control. Bearman et al. (2024) give us a framework for teaching it.
We don’t need to outlaw AI in the classroom. We need to outlaw the assumption that students will figure out quality on their own.

References
- Bearman, M., Nieminen, J. H., & Ajjawi, R. (2023). Designing assessment in a digital world: An organising framework. Assessment & Evaluation in Higher Education, 48(3), 291-304. https://doi.org/10.1080/02602938.2022.2069674 .
- Bearman, M., Tai, J., Dawson, P., Boud, D., & Ajjawi, R. (2024). Developing evaluative judgement for a time of generative artificial intelligence. Assessment & Evaluation in Higher Education, 49(6), 893-905. https://doi.org/10.1080/02602938.2024.2335321
- Fan, Y., Tang, L., Le, H., Shen, K., Tan, S., Zhao, Y., Shen, Y., Li, X., & Gašević, D. (2025). Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. British Journal of Educational Technology, 56(2), 489–530. https://doi.org/10.1111/bjet.13544
