Higher education has spent the last few years in a state of low-grade panic about cheating. The arrival of ChatGPT made it worse, but the anxiety was there long before, fueled by a multi-billion dollar assessment security industry and a conversation that keeps circling the same question: are students cheating? Dawson, Bearman, Dollinger, and Boud (2024) argue, convincingly, that it’s the wrong question. The right one is: are our assessments valid?
Their paper, published in Assessment & Evaluation in Higher Education, is a conceptual analysis that proposes subsuming cheating entirely within the broader concept of assessment validity. If that sounds radical, it’s partly because the field has treated cheating as a category of its own for so long that it’s hard to imagine thinking about it any other way. But Dawson et al. build a layered case for why the cheating frame is both unstable and counterproductive, and why validity offers a better path forward.
They start by asking a question that sounds simple but isn’t: what is cheating? List-based definitions, the kind most institutional policies rely on, specify prohibited behaviours. But these lists keep expanding, they can never fully keep up with new technologies, and they produce alarming numbers.
Dawson et al. note that when even the most benign items on a checklist get ticked, some studies end up claiming that over 90% of students cheat (Barnhardt, 2016, cited in Dawson et al., 2024). Abstract definitions try to avoid listing specific acts, but they come with their own problems. Some treat education as a race where cheating means gaining unfair advantage. Others frame it as a game where cheating is simply breaking the rules. Some definitions are outright circular: cheating is what happens when you break the rules against cheating.
The paper also makes a point that the field needs to reckon with: cheating is socially constructed. Dawson et al. point out that what counts as cheating changes across cultures, disciplines, and time periods. They also flag the role of the assessment security industry, the companies that sell proctoring software, text-matching tools, and AI detection products, in shaping what cheating means.
These vendors have a financial interest in the appearance of high cheating rates. The academy, Dawson et al. argue, may be ceding some of its responsibility for defining cheating to for-profit companies without fully recognizing it. I’ve covered Luo’s (2024) critical review of originality policies at top universities, and the same dynamic appears there: institutions letting compliance frameworks and detection tools define what “original work” means, without interrogating the concept itself.

The usual justifications for why cheating is wrong get a thorough examination. Dawson et al. walk through the most common arguments and find each one wanting. The “unfairness” argument assumes education is a competition, which doesn’t hold up well in a standards-based system. If every student who meets the outcomes can pass, one student’s unearned grade shouldn’t change another’s.
The “it hurts learning” argument has an obvious problem, as Dawson et al. note: we don’t punish students for having part-time jobs or hobbies, even though those also take time away from studying. The values-based argument from academic integrity is stronger, but Dawson et al. make an important point: the absence of cheating is not evidence of integrity. A student who avoids cheating because they’re afraid of getting caught hasn’t demonstrated any values at all.
Dawson et al. also surface the concept of “fundamental attribution error” from Kohn (2007, cited in Dawson et al., 2024). We default to blaming individual students for cheating, when the behaviour is often a product of the system they’re operating in. Busywork, credentialism, grade obsession, assessment designs that practically invite workarounds, all of these contribute. And the paper raises the uncomfortable question of who is really in the wrong: the student, the educational system, assessment designers who set inappropriate tasks, or a society that places so much weight on credentials?
The cost-effectiveness argument is damning. Dawson et al. report that there’s no peer-reviewed evidence they could find showing that remote proctoring effectively detects cheating. Academic integrity modules have been used for decades with no demonstrated reduction in cheating rates. Honor codes produce only very small reductions (McCabe, Treviño, and Butterfield, 2002, cited in Dawson et al., 2024). And the anti-cheating industry is a multi-billion dollar sector funded largely by universities. That’s a lot of money for approaches with weak evidence behind them.
The costs go beyond money. Dawson et al. describe how anti-cheating measures stifle collaboration, discourage feedback-seeking, and turn the student-instructor relationship adversarial. They reference the concept of “cop shit,” defined by Moro (2020, cited in Dawson et al., 2024) as any pedagogical technique that presumes students and teachers are adversaries.
Remote proctoring has been critiqued as racist and ableist. Text-matching software monetizes student data. And most students don’t engage in the most egregious forms of cheating, but the surveillance measures get applied to everyone. The harm is distributed broadly. The benefit is unproven.
The reframing Dawson et al. propose is clean: “A validity perspective makes the claim: a students’ assessment submission is valid if it represents their actual capability” (p. 1010). In this view, the act of cheating becomes a form of failure to demonstrate capability. It’s not a moral failing. If a student’s work can’t be verified as representing what they actually know and can do, the appropriate response is to award no credit for the relevant parts of the task.
Dawson et al. are explicit about this: “awarding the student no credit for the relevant parts of the task is simply assessment, not punishment” (p. 1011). I’ve covered related arguments in work on the wicked problem of AI and assessment, where Corbin, Bearman, Boud, and Dawson made the case that AI doesn’t just create new challenges for assessment, it exposes the fragility of systems that were already struggling.
On AI specifically, Dawson et al. are direct: “The use of artificial intelligence in assessment is not ‘cheating’, it is a condition to be attended to alongside other validity matters” (p. 1012). Assessments that depend on students not using AI, but can’t actually prevent them from doing so, aren’t useful for high-stakes purposes. The problem belongs to the assessment, not the student.
I agree with this fully. McDonald, Johri, Ali, and Hingle Collier (2025) analyzed GenAI guidance from 116 US R1 universities and found that most institutions are still issuing contradictory guidance, simultaneously encouraging GenAI use and treating it as a misconduct risk, with no research-backed evidence for their recommendations. Moorhouse, Yeo, and Wan (2023) documented similar patterns across the world’s top 50 universities. The conversation keeps defaulting to policing when it should be focusing on assessment design.
Dawson et al. also introduce consequential validity into the discussion, the impact, beneficial or harmful, of the assessment itself and the decisions that follow from it. Pre-exam cramming is part of the consequential validity of high-stakes testing. Assignment-sharing sites are a consequence of single-right-answer tasks. Student behaviour, in this reading, is a function of assessment design. And trade-offs are unavoidable. The construct validity benefits of strict rules against collaboration “might not be worth the consequential validity effects of discouraging peer learning and feedback seeking” (p. 1013).
Validity, Dawson et al. argue, is a claim, not an absolute. No single assessment can address every threat. But multiple types of assessment, at multiple time points, with multiple assessors can build what they call a stronger evidentiary chain. This aligns with programmatic assessment approaches and with the “Swiss Cheese” model, where no single layer is perfect, but stacked layers cover each other’s gaps. I think this is where the paper is most useful for practitioners. It gives them a framework that doesn’t demand perfection from any single assessment. It demands a system that, taken as a whole, can tell you something trustworthy about what graduates actually know.
References
- Barnhardt, B. 2016. “The “Epidemic” of Cheating Depends on Its Definition: A Critique of Inferring the Moral Quality of “Cheating in Any Form.” Ethics & Behavior 26 (4): 330–343. doi:10.1080/10508422.2015.1026595.
- Dawson, P., Bearman, M., Dollinger, M., & Boud, D. (2024). Validity matters more than cheating. Assessment & Evaluation in Higher Education, 49(7), 1005–1016. https://doi.org/10.1080/02602938.2024.2386662
- Kohn, A. 2007. “Who’s Cheating Whom?” Phi Delta Kappan 89 (2): 89–97.
- Luo, J. (2024). A critical review of GenAI policies in higher education assessment: A call to reconsider the “originality” of students’ work. Assessment & Evaluation in Higher Education, 49(5), 651-664. https://doi.org/10.1080/02602938.2024.2309963
- McCabe, D. L., L. K. Treviño, and K. D. Butterfield. 2002. “Honor Codes and Other Contextual Influences on Academic Integrity: A replication and Extension to Modified Honor Code Settings.” Research in Higher Education 43 (3): 357–378. doi:10.1023/A:1014893102151.
- Moorhouse, B. L., Yeo, M. A., & Wan, Y. (2023). Generative AI tools and assessment: Guidelines of the world’s top-ranking universities. Computers and Education Open, 5, 100151. https://doi.org/10.1016/j.caeo.2023.100151
- Moro, J. 2020. “Against Cop Shit.” https://jeffreymoro.com/blog/2020-02-13-against-cop-shit/
