The conversation about AI and assessment has been stuck in the wrong place for too long. Most of the debate circles around detection, integrity, and whether students are “cheating” with ChatGPT. I’ve written about this repeatedly on this blog, and my position hasn’t changed: detection is a dead end, and the real work is redesigning assessment so that AI becomes part of the learning process, not a threat to it.
A new scoping review by Fajardo-Ramos, Chiappe, and Mella-Norambuena (2025) shifts the lens in a direction I think is long overdue. They ask a question most assessment studies ignore: what does all of this mean for the people who actually design and run assessments? What competencies do teacher educators need when AI is woven into assessment at every level?
The scope here is ambitious. Fajardo-Ramos et al. screened 76 peer-reviewed studies published between 2015 and 2025, all focused on digital assessment tools in Ibero-American higher education. The review draws from English, Spanish, and Portuguese sources, and it specifically targets teacher training and teacher education programs. That regional focus matters.

The international literature on AI in assessment tends to assume a certain level of institutional readiness, funding, and infrastructure that doesn’t always match the realities of Latin American and Iberian universities. Fajardo-Ramos et al. name this gap directly: the international conversation frames AI as a lever for formative assessment, personalization, and feedback literacy, but the Ibero-American corpus reveals what they call an “operations-first trajectory,” where institutions prioritize proctoring, grading efficiency, and integrity controls at scale. Formative redesign advances slowly and unevenly.
One of the review’s strongest contributions is a function-by-purpose taxonomy that maps assessment technologies across six functions: delivery and integrity, criteria and rubrics, automated feedback and tutoring, analytics and early warning, portfolios and peer/self-assessment, and authentic or multimodal evidence. For each function, the authors trace the pre-AI baseline, the AI-enhanced mechanism, and the specific implications for teacher training. This is useful because it resists the lazy framing of AI as a single monolithic tool. AI in a rubric co-creation system and AI in a proctoring engine are not the same thing, and they demand very different competencies from educators.
The Competency Clusters That Teacher Programs Need
Fajardo-Ramos et al. identify five competency clusters that teacher education programs should be building. These are: (1) feedback literacy with AI, including criterion-anchored prompting, sampling and audit cycles, and revision-based workflows; (2) rubric and item validation with traceability; (3) data interpretation and stewardship, covering consent, minimization, and fairness checks; (4) integrity and transparency in AI-involved assessment; and (5) orchestration of platforms and moderation or double-marking when AI assists scoring.

I find this list credible and specific enough to be actionable, which is rare in the literature. Most frameworks stop at vague calls for “AI literacy” or “digital competence” and leave the details to the reader. I’ve covered several AI literacy frameworks on this blog, including Chee, Ahn, and Lee’s (2025) competency progression and UNESCO’s (2024) student AI competency framework, and the pattern I keep seeing is a gap between the conceptual architecture and the actual classroom practices it’s supposed to inform.
Fajardo-Ramos et al. close some of that gap by anchoring each cluster to a specific assessment function. Feedback literacy, for example, isn’t a generic skill here. It means knowing how to design prompts that produce criterion-referenced AI feedback, how to audit samples for consistency and bias, and how to build revision workflows where students engage with AI-generated comments critically, not passively.
That last point connects directly to what Hawkins, Taylor-Griffiths, and Lodge (2025) found about feedback literacy and AI-enhanced essay writing. Students don’t automatically learn from AI feedback. The learning happens when they’re required to evaluate and respond to it, and that requires deliberate pedagogical design.
The Use-Trust Gap
The qualitative findings in this review surface something I find both compelling and familiar. Fajardo-Ramos et al. describe a “use-trust gap”: educators recognize the motivational and pacing benefits of rapid AI feedback, but they hesitate to delegate judgment in areas that demand nuanced interpretation, things like argumentation, interdisciplinary reasoning, and ethical analysis. Students, for their part, question accuracy and relevance when the feedback feels generic or ungrounded in the actual task criteria. Trust grows only when human moderation and feedback triage are built into the process.
This tracks with what the broader research has been showing. Fan et al. (2025) documented that AI improved essay quality without improving the cognitive processes behind the writing. Students skipped metacognitive steps when ChatGPT was available. The product got better, but the thinking didn’t. The trust problem Fajardo-Ramos et al. describe is, at its core, the same issue from the educator’s side: if AI can generate plausible feedback at speed, what’s left for the human to do, and how do you convince stakeholders that the human role still matters?
The authors’ answer is professionalizing human-in-the-loop assessment. They argue that the real task is reallocating pedagogical work, from manual production to design, oversight, and documentation of AI-involved processes. I agree with that framing. And I’d add that this reallocation has to be visible. If the new labor of rubric refinement, audit sampling, and feedback triage stays invisible or uncompensated, institutions will keep treating AI as a cost-cutting measure and wonder why trust never builds.
Five Tensions That Won’t Resolve Cleanly
Fajardo-Ramos et al. map five recurring tensions in the literature that I think any educator working with AI-assisted assessment needs to understand. Opacity versus trust: when AI feedback criteria and error modes aren’t made explicit, educators can’t vouch for its quality. Scale versus relationship: automation can displace the dialogic moments that make feedback formative. Compliance versus learning: surveillance tools can narrow what counts as competence. Efficiency versus rigor: time saved in marking gets consumed by rubric refinement and auditing. Innovation versus coherence: local pilots flourish but produce fragmented “islands of practice” when policy can’t keep up.
I’ve written about the wicked problem of AI and assessment through Corbin, Bearman, Boud, and Dawson’s (2025) framework, and these five tensions are exactly the kind of thing that makes the problem wicked. There’s no clean technical fix. Each tension involves legitimate values in genuine conflict, and the resolution is always contextual, local, and provisional. Fajardo-Ramos et al. don’t pretend otherwise, which I respect. They acknowledge that the conceptual promise is pedagogical improvement, but the situated reality is “controlled modernization,” with learning gains contingent on pockets of strong design and governance.
What I’d Add to This Picture
The review is solid and one of the more useful mapping exercises I’ve read this year on AI and assessment. But I want to name one limitation. The taxonomy and competency clusters are built from a corpus that skews toward quantitative tool evaluations. The five tensions are drawn from qualitative synthesis, and they’re the most insightful part of the paper.
I’d want to see future work that spends more time in the tensions, that studies how actual teacher education programs navigate them in real time, not just how the literature describes them. We have enough taxonomies. What we need now are detailed accounts of what happens when an instructor tries to audit AI-generated feedback across 200 student submissions, or when a program coordinator tries to harmonize AI assistance policies across twelve different courses.
That’s the operational reality this paper points toward but doesn’t fully populate. And I think it’s the next frontier for research on AI and assessment in teacher education.
References
- Chee, H., Ahn, S., & Lee, J. (2025). A competency framework for AI literacy: Variations by different learner groups and an implied learning pathway. British Journal of Educational Technology, 56, 2146-2182. https://doi.org/10.1111/bjet.13556
- Corbin, T., Bearman, M., Boud, D., & Dawson, P. (2025). The wicked problem of AI and assessment. Assessment & Evaluation in Higher Education. 1–17. https://doi.org/10.1080/02602938.2025.2553340
- Fajardo-Ramos, D. C., Chiappe, A., & Mella-Norambuena, J. (2025). Human-in-the-loop assessment with AI: Implications for teacher education in Ibero-American universities. Frontiers in Education, 10, 1710992. https://doi.org/10.3389/feduc.2025.1710992
- Fan, Y., Tang, L., Le, H., Shen, K., Tan, S., Zhao, Y., Shen, Y., Li, X., & Gašević, D. (2025). Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. British Journal of Educational Technology, 56(2), 489–530. https://doi.org/10.1111/bjet.13544 // https://medkharbach.com/metacognitive-laziness-and-ai/
- Hawkins, B., Taylor-Griffiths, D., & Lodge, J. M. (2025). Summarise, elaborate, try again: Exploring the effect of feedback literacy on AI-enhanced essay writing. Assessment & Evaluation in Higher Education. https://doi.org/10.1080/02602938.2025.2492070
- UNESCO. (2024). AI competency framework for students. United Nations Educational, Scientific and Cultural Organization. https://doi.org/10.54675/JKJB9835
