ChatGPT and Higher Education Assessment

I remember reading this paper almost two years ago, just months after ChatGPT had launched and the education world was scrambling to respond. Rudolph, Tan, and Tan (2023) published one of the first serious academic treatments of what ChatGPT meant for higher education, and their title captured the mood perfectly: “ChatGPT: Bullshit Spewer or the End of Traditional Assessments in Higher Education?” Three years later, I think their analysis holds up remarkably well, and some of their recommendations have only grown more urgent.

The paper, published in the Journal of Applied Learning & Teaching, does something I appreciated at the time and still appreciate now. It refuses to panic. The authors situate ChatGPT within the longer history of educational technology hype cycles. Film, radio, television, MOOCs, social media, all promised to overturn universities. Classrooms, for the most part, persisted. And the question was never whether ChatGPT would destroy education. It was whether the tool would expose weaknesses that were already there.

ChatGPT Doesn’t Create Fragility, It Reveals It

The most important argument in the paper is also the simplest. If an AI tool can complete an assessment task with minimal understanding, the problem isn’t the AI. It’s the task. Rudolph et al. tested ChatGPT on various academic assignments and found it produced fluent, competent responses to conceptual questions and short-answer tasks. It struggled with depth, critical analysis, and citation accuracy. But for a large portion of standard coursework, the output was good enough to pass.

Rudolph et al. also tested plagiarism detection tools against AI-generated text and found them unreliable. As they note: “Plagiarism checkers such as the one embedded in the professional version of Grammarly are unlikely to flag text generated by ChatGPT and similar programs, as it is, after all, original text” (p. 354). The irony is sharp. The text is original in the technical sense. It just wasn’t produced through any process of human learning.

I’ve written about this same dynamic from several angles in my previous posts. Corbin, Dawson, and Liu (2025) later formalized it as the “enforcement illusion,” the gap between discursive rules about AI use and any ability to actually verify compliance. Eaton (2023) argued through her postplagiarism framework that detection is fundamentally a dead end, and that integrity systems need to be grounded in responsibility and ethical reasoning. Rudolph et al. arrived at the same conclusion in early 2023: detection won’t save us, and the anxiety about ChatGPT reveals fragilities in assessment design that predate the tool entirely.

ChatGPT and Higher Education Assessment

Trust Over Surveillance

The paper’s pedagogical stance is clear, and I agree with it completely. Rudolph et al. write:

Generally, we advise against a policing approach (that focuses on discovering academic misconduct, such as detecting the use of ChatGPT and other AI tools). We favour an approach that builds trusting relationships with our students in a student-centric pedagogy and assessments for and as learning rather than solely assessments of learning. (p. 354)

I think the field has largely moved in this direction, at least in the research literature. Corbin, Bearman, Boud, and Dawson (2025) framed AI and assessment as a wicked problem, one that can’t be solved with a single policy but requires ongoing navigation, compromise, and iteration. Perkins and Roe (2025) argued that retreating to supervised exams won’t protect assessment validity in the long run.

Sperber et al.’s (2025) PAIRR model showed that combining AI feedback with peer feedback and student reflection produces assessments where the process of thinking becomes visible and evaluable. All of these developments align with what Rudolph et al. were calling for in 2023: move away from policing final products and toward assessing the learning process itself.

What I appreciate about re-reading this paper now is how clearly it anticipated the conversation we’re still having. The authors didn’t have access to the empirical studies that would come later, but they correctly identified the core tension. Assessment designed around final products is structurally vulnerable to AI. But assessment designed around process, reflection, and authentic demonstration of understanding holds up.

What Large Language Models Actually Do

Rudolph et al. also took time to explain what ChatGPT actually is, technically, and I think this part of the paper remains underappreciated. They describe large language models clearly:

“Such language models are not designed to store or retrieve facts. They are ‘just good at predicting the next word(s) in the sequence’ (Cooper, 2021)” (p. 344)

In 2023, many educators didn’t fully grasp this. Some assumed ChatGPT “knew” things, while others thought it was merely copying from the internet. The reality is more nuanced and more important for pedagogy. These systems generate statistically plausible text without understanding, intent, or meaning, and they can produce fluent prose that looks like critical analysis without any actual analysis having occurred.

And that technical reality shapes how we should use these tools in our teaching. If students treat ChatGPT as an authority, they risk what Shaw and Nave (2026) later called cognitive surrender, deferring to AI outputs without critical evaluation. If teachers understand the tool’s actual capabilities and design tasks accordingly, they can use it as a productive component of learning. Bastani et al. (2025) confirmed this in their math study: open ChatGPT harmed learning, but a guardrailed version that constrained the tool’s responses erased the damage. The design determined the outcome.

Why I Think This Paper Still Matters

Rudolph et al. published this paper at a moment of genuine confusion. Most institutions were scrambling to ban or ignore ChatGPT. Very few were asking the deeper questions about what AI revealed about assessment quality, about the relationship between trust and learning, about the difference between producing text and developing understanding.

Three years later, the field has moved. We have empirical studies, frameworks, and practical models. But the fundamental argument Rudolph et al. made hasn’t been superseded. It’s been confirmed. ChatGPT didn’t break assessment. It showed us where assessment was already broken. And the path forward runs through redesign, trust, and pedagogical intentionality, not through surveillance and detection.

If you’re still figuring out your approach to AI in assessment, this paper is worth reading for the clarity of its framing alone. The tools have evolved since 2023. The questions haven’t.

Reference

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top