There’s something valuable about going back to the earliest research on ChatGPT in education. The findings aren’t current anymore, but they tell us what the field was seeing, and failing to see, when the tool first showed up. Baidoo-Anu and Owusu Ansah (2023) published one of those first-wave papers, an exploratory review that tried to map the potential benefits and drawbacks of ChatGPT for teaching and learning. The data collection ran from November 2022 to March 2023, which means the authors were writing in real time as the educational world was still trying to figure out what had just landed in their classrooms.
I’m covering this paper in 2026 because some of what they flagged early has been confirmed by years of research since. And some of it reads like a rough draft of arguments we’ve now built entire frameworks around.
The methodology is the first thing worth talking about. Baidoo-Anu and Owusu Ansah didn’t just review the literature. They asked ChatGPT to explain itself. They prompted it with questions like “What are the benefits of ChatGPT in advancing teaching and learning?” and then used its responses as a starting point, supplementing with peer-reviewed sources. They were transparent about this approach, and I’ll give them credit for that. Few researchers at the time were even thinking about what it meant to use an AI tool as both the subject and a contributor to the research about it.
That said, the approach introduces problems the authors don’t fully reckon with. ChatGPT’s self-descriptions are marketing copy dressed up as information. The model has no actual understanding of its educational value. It generates plausible-sounding claims about personalized tutoring and adaptive learning because that’s the kind of language its training data contains. The authors seem aware of this tension but don’t interrogate it deeply enough.

ChatGPT in Education Benefits
Baidoo-Anu and Owusu Ansah identify six potential benefits: personalized tutoring, automated essay grading, language translation, interactive learning, adaptive learning, and formative assessment. Each one is paired with a supporting study.
The list is reasonable for 2023. The problem is that most of the cited studies were testing earlier generative models, not ChatGPT specifically. The authors don’t always flag that gap, and the result is a benefits section that reads more like a catalog of possibilities than a grounded assessment of what ChatGPT was actually doing in classrooms at the time. We’ve since moved past possibility-mapping. Studies like Bastani et al. (2025), which tracked nearly 1,000 high school math students using ChatGPT with and without guardrails, showed that the outcomes depend entirely on how the tool is designed and deployed. Possibility without pedagogy is just speculation.
Where the Paper Was Ahead of Its Time
Two observations from this early paper have aged well.
First, the hallucination finding. Baidoo-Anu and Owusu Ansah discovered that ChatGPT fabricated a reference when asked to generate supporting sources. It produced a full citation for “Ribeiro and Vala, 2020” complete with a non-functional URL. The authors went looking for the paper, couldn’t confirm it existed, and reported the fabrication directly.
In early 2023, this was a novel observation. Rudolph, Tan, and Tan (2023) were making a similar case around the same time, calling ChatGPT a “bullshit spewer” that generates confident-sounding nonsense. The hallucination problem would go on to become one of the central issues in AI literacy research, and Baidoo-Anu and Owusu Ansah caught it in real time.
Second, the detection argument. The authors noted that AI text detectors were already proving unreliable against sophisticated models. They also made a point that many researchers at the time were still ignoring: students have access to the same detection tools and can revise their text until it passes.
That’s an argument I’ve seen confirmed repeatedly since. Detection as an enforcement strategy is a dead end. I’ve covered this across multiple posts, including Corbin, Bearman, Boud, and Dawson’s (2025) framing of AI and assessment as a wicked problem that can’t be solved with technical fixes alone.
The drawbacks section is revealing for reasons the authors may not have intended. They asked ChatGPT to list its own limitations, and it produced eight: lack of human interaction, limited understanding, bias in training data, lack of creativity, data dependency, contextual understanding gaps, limited personalization, and privacy concerns.
These are all real issues. But they’re also exactly the kind of measured, reasonable-sounding self-assessment that a language model would produce. The list is too neat. It avoids the harder problems, like the ways AI-generated feedback lacks the relational dimension that makes it formative, or the cognitive costs of offloading thinking to a machine.
Research since 2023 has gone much deeper. Fan et al. (2025) documented metacognitive laziness, students skipping the evaluation and monitoring steps that make revision productive. Shaw and Nave (2026) introduced cognitive surrender as a concept for how AI reshapes reasoning patterns over time. These are structural problems with how humans interact with AI, and they go well beyond “limited understanding” as a category.
Reading This Paper in 2026
Baidoo-Anu and Owusu Ansah were writing during a period when the field was still deciding whether to panic, celebrate, or pretend nothing had changed. Their paper tries to do all three at once. The benefits section leans optimistic, the drawbacks section sounds the alarm, and the conclusion pivots to calling for collaboration and forward thinking. That structure was common in early 2023 because nobody had enough data to do anything else.
What I value about the paper is the assessment argument. Baidoo-Anu and Owusu Ansah wrote that educators “may need to rethink how students are assessed” and that they “may have to change how assessment is currently done to more innovative assessments” (p. 59). That was a measured prediction in 2023. By 2026, it’s no longer a prediction. It’s a fact. Assessment has become the central battleground for AI in education, and institutions that didn’t start rethinking early are now scrambling to catch up.
The paper also raises questions that still don’t have clean answers: How do we integrate AI into teacher education programs? Will these tools narrow or widen the digital divide? These aren’t dated questions. They’re unresolved ones.
What the paper couldn’t have anticipated is how fast the technology would move. The ChatGPT they were testing was GPT-3.5 with a knowledge cutoff of 2021. We’ve since seen multimodal models, AI agents embedded in operating systems, and tools that can write, code, analyze data, and hold extended conversations with memory. The gap between the tool they studied and the tools students are using today is enormous. That doesn’t make the paper irrelevant. It makes it a time capsule, and time capsules are useful when they remind us how far the conversation has come and how much further it still needs to go.
References
- Bastani, H., Bastani, O., Sungu, A., Geb, H., Kabakcı, Ö., & Marimane, R. (2025). Generative AI without guardrails can harm learning: Evidence from high school mathematics. Proceedings of the National Academy of Sciences, 122(26), e2422633122. https://doi.org/10.1073/pnas.2422633122
- Baidoo-Anu, D., & Owusu Ansah, L. (2023). Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI, 7(1), 52-62. https://doi.org/10.61969/jai.1337500
- Corbin, T., Bearman, M., Boud, D., & Dawson, P. (2025). The wicked problem of AI and assessment. Assessment & Evaluation in Higher Education. 1–17. https://doi.org/10.1080/02602938.2025.2553340
- Fan, Y., Tang, L., Le, H., Shen, K., Tan, S., Zhao, Y., Shen, Y., Li, X., & Gašević, D. (2025). Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. British Journal of Educational Technology, 56(2), 489–530. https://doi.org/10.1111/bjet.13544 // https://medkharbach.com/metacognitive-laziness-and-ai/
- Rudolph, J., Tan, S., & Tan, S. (2023). ChatGPT: Bullshit spewer or the end of traditional assessments in higher education? Journal of Applied Learning & Teaching, 6(1), 342–354. https://doi.org/10.37074/jalt.2023.6.1.9
- Shaw, S. D., & Nave, G. (2026). Thinking fast, slow, and artificial: How AI is reshaping human reasoning and the rise of cognitive surrender. Working paper, The Wharton School, University of Pennsylvania. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646 // https://medkharbach.com/cognitive-surrender-how-ai-is-quietly-reshaping-the-way-we-think/
