AI and Assessment in Higher Education: What 32 Studies Reveal

The conversation about AI and assessment in higher education has moved fast. In 2023, most institutions were still deciding whether to ban ChatGPT or pretend it didn’t exist. We’re in 2026 now, and the real question has become how to redesign assessment so it still measures what matters when every student has a powerful AI assistant in their pocket.

Xia et al. (2024) published a scoping review right in the middle of that transition, analyzing 32 empirical studies from ERIC, Web of Science, and Scopus to map how generative AI is reshaping assessment at three levels: students, teachers, and institutions.

The paper is useful, and it’s also a snapshot of a very specific moment.

Xia et al. completed their search on September 30, 2023, which means every study they reviewed was produced during the first year after ChatGPT’s launch. The field was in reactive mode. Policies were being thrown together in a rush. Teachers were panicking about essays. The research reflected that urgency, with all its blind spots. I want to engage with what the review found on its own terms, and I want to be upfront about where newer work has already moved past what this evidence base could support.

AI and Assessment in Higher Education

AI and Assessment: When Cheating Dominated the Conversation

Fifty-four percent of the articles Xia et al. reviewed flagged academic integrity as the dominant student-level concern. ChatGPT scored at First-Class level in UK university assessments, passed engineering exams, and performed well across economics, medicine, and geography. One study declared short-form essays “an obsolete assessment tool” (Yeadon et al., 2023, cited in Xia et al., 2024). The alarm was real, and I understand where it came from.

From where we are in 2026, though, the integrity fixation looks like a symptom of a field that hadn’t yet figured out what it was really worried about. I’ve written about Dawson, Bearman, Dollinger, and Boud (2024), who argue that assessment design should center on validity, with cheating treated as a secondary concern.

That argument wasn’t available to most of the researchers Xia et al. reviewed. The 2023 literature was stuck on “can students cheat with ChatGPT?” The sharper question, the one that took another year to fully surface, was whether we’re even measuring what we claim to be measuring.

The review does identify three opportunities GenAI created for students: perceived unbiased feedback, immediate and diverse feedback, and tools for self-assessment. Students could generate their own rubrics, practice questions, and revision prompts. That’s genuinely promising material for building self-regulated learning. But Xia et al. also note that these self-assessment activities didn’t involve critical thinking. The feedback loop ran shallow. Students got responses from AI. They didn’t learn how to evaluate those responses or fold them into their own reasoning.

That pattern connects to what Fan et al. (2025) later called metacognitive laziness, where students offload thinking to AI without reflecting on their own cognitive processes. The early warning signs were already visible in the studies Xia et al. reviewed, even if the 2023 literature didn’t have the vocabulary for it yet.

The Teacher Readiness Gap in AI Assessment

The most revealing finding in the teacher-level analysis is about what teachers believe. Xia et al. report a clear split: some instructors had already started treating ChatGPT as a “one-stop shop” for sourcing knowledge and generating content (Cross et al., 2023, cited in Xia et al., 2024). Others recognized that ChatGPT’s feedback was often too long, off-topic, or misaligned with the actual criteria of their courses. That gap between assumption and reality is where most of the practical problems live.

The review recommends moving assessment toward mixed methods, with AI handling language checking and self-assessment and teachers focusing on ideas, reasoning, and higher-order thinking.

Xia et al. describe this as a shift “from human-centered or machine-centered to mixed methods, transforming the traditional mode of assessment” (p. 10). I think the direction is right. AI can do certain kinds of feedback well and quickly. Teachers bring judgment, context, and disciplinary expertise that no model replicates. The challenge is building systems that make that collaboration real and not aspirational.

Seventeen of the 31 articles on teacher impact raised concerns about students losing essential skills through overreliance on AI. Creativity, independent thinking, teamwork, leadership, empathy: the review reports that teachers saw all of these at risk. That concern has only gotten sharper since 2023. The cognitive science work from Gerlich (2025) and Kosmyna et al. (2025), which I’ve covered on this blog, now gives it empirical grounding the earlier studies couldn’t provide.

A third of the reviewed articles also stressed the need for stronger teacher assessment literacy. Teachers need to know how to design tasks that AI can’t simply complete, how to tell the difference between student-generated and AI-generated work, and how to offer feedback that complements what AI already provides. Xia et al. correctly identify this as a professional development problem, and they note that most institutions haven’t seriously invested in solving it.

What I find missing from the teacher-level findings is any serious treatment of assessment validity as a design principle. The 2023 studies talk about making assessment “cheat-proof” and about needing “more diverse” methods. Those are reasonable starting points. But redesigning assessment formats without asking whether the evidence collected actually supports the inferences being made just recreates old problems in new packaging. Corbin, Bearman, Boud, and Dawson (2025) call this the “wicked problem” of AI and assessment, and the literature Xia et al. reviewed hadn’t gotten there yet.

Institutional Policy and the Interdisciplinary Question

At the institutional level, the review found five areas of impact. The most interesting is the argument for interdisciplinary programs. Xia et al. reason that GenAI gives students access to knowledge and skills from fields they wouldn’t normally study, which opens up project-based and problem-based learning in new ways.

If students can use AI to generate visuals, write code, or pull content from unfamiliar disciplines, assessment should cross those boundaries too. The authors recommend that institutions create dedicated funding for interdisciplinary programs and build assessment practices around them.

That’s an ambitious recommendation, and I think it’s directionally correct. The rest of the institutional findings track with what you’d expect: redesign assessment policies, invest in AI and digital literacy training, rethink learning objectives. Perkins and Roe (2025) have gone further, arguing that we may be witnessing the end of assessment as we’ve known it. Xia et al.’s review, rooted in 2023 evidence, points in that direction but can’t see as far ahead.

The authors themselves acknowledge that many of the reviewed studies discussed GenAI in general terms without specifying which tools or versions were involved. That vagueness makes it hard to draw precise conclusions about what actually worked, and it’s a limitation worth naming when reading the review’s broader claims.

What This Review Tells Us Now

This paper does what a good scoping review should: it maps the terrain at a particular moment and names the patterns emerging from it. The moment it maps is worth understanding, even now. The early literature’s fixation on integrity, the split in teacher beliefs, the shallow self-assessment loops, the institutional scramble for new policies: these were real problems in 2023, and most of them remain unsolved.

What’s different now is the sophistication of the questions being asked. We’ve moved from “should we ban it?” to “how do we design assessment that holds up when AI is in the room?” Xia et al. point toward that second question without fully arriving at it, which makes sense given the evidence they had to work with.

I’d encourage anyone reading this review to treat it as a baseline, not a playbook. Institutions still framing AI as a cheating problem are falling behind the research. The ones asking validity questions, building teacher capacity, and redesigning assessment around what students actually need to learn are doing work that will last. The tools will keep changing. The pedagogical questions won’t.

References

  • Corbin, T., Bearman, M., Boud, D., & Dawson, P. (2025). The wicked problem of AI and assessment. Assessment & Evaluation in Higher Education. 1–17. https://doi.org/10.1080/02602938.2025.2553340
  • Dawson, P., Bearman, M., Dollinger, M., & Boud, D. (2024). Validity matters more than cheating. Assessment & Evaluation in Higher Education, 49(7), 1005–1016. https://doi.org/10.1080/02602938.2024.2386662
  • Fan, Y., Tang, L., Le, H., Shen, K., Tan, S., Zhao, Y., Shen, Y., Li, X., & Gašević, D. (2025). Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. British Journal of Educational Technology, 56(2), 489–530. https://doi.org/10.1111/bjet.13544 
  • Gerlich, M. (2025). AI tools in society: Impacts on cognitive offloading and the future of critical thinking. Societies, 15(1), Article 6. https://doi.org/10.3390/soc15010006 
  • Kosmyna, N., Hauptmann, E., Yuan, Y. T., Situ, J., Liao, X.-H., Beresnitzky, A. V., Braunstein, I., & Maes, P. (2025). Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing tasks. MIT Media Lab. https://www.media.mit.edu/publications/your-brain-on-chatgpt/      
  • Perkins, M., & Roe, J. (2025). The end of assessment as we know it: GenAI, inequality and the future of knowing. In AI and the future of education: Disruptions, dilemmas and directions (pp. 76–80).  https://durham-repository.worktribe.com/output/4472558/the-end-of-assessment-as-we-know-it-genai-inequality-and-the-future-of-knowing. 
  • Xia, Q., Weng, X., Ouyang, F., Lin, T. J., & Chiu, T. K. F. (2024). A scoping review on how generative artificial intelligence transforms assessment in higher education. International Journal of Educational Technology in Higher Education, 21(40), 1-22. https://doi.org/10.1186/s41239-024-00468-z

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top