I find it useful to occasionally look back at where the conversation about AI in education began. Before ChatGPT, before the panic about academic integrity, before every conference had an AI track, researchers were already studying how chatbots work in educational settings. Okonkwo and Ade-Ibijola (2021) published a systematic review of 53 peer-reviewed studies from 2015 to May 2021, and reading it now, in early 2026, offers a fascinating window into both how much has changed and how much hasn’t.
Their paper, “Chatbots Applications in Education: A Systematic Review,” published in Computers and Education: Artificial Intelligence, maps the field across five application areas, identifies consistent benefits and recurring challenges, and outlines where future research should focus. What strikes me most is how many of the problems they flagged in 2021 remain unresolved today, and how many of the benefits they described have only intensified with the arrival of generative AI.
Teaching and Learning Dominated Even Then
The review found that roughly two-thirds of chatbot studies focused on teaching and learning. Chatbots delivered course content, answered student questions, guided practice, and provided immediate feedback. Some systems adapted to individual learner responses and simulated tutoring interactions. Others supported specific skills like programming or writing. As the authors report: “The majority of Chatbot system applications in Education are focused on teaching and learning, administration, assessment, advisory, and research and development” (p. 8).
It feels familiar. Even now, with tools like ChatGPT, Claude, and Gemini widely available, the primary conversation about AI in education still centers on instruction. How do students use it to learn? How do teachers use it to teach? The focus has scaled up dramatically, but the fundamental questions haven’t shifted as much as we might think.
What has changed is the sophistication of the interaction. The chatbots in this review were largely rule-based or retrieval-based systems with limited conversational ability. Today’s large language models can hold extended conversations, generate ‘original’ content, and respond to nuanced prompts. The gap between those early systems and what students have access to now is enormous, and it has implications for every finding in this review.
The Benefits Were Real, and They’ve Only Grown
Okonkwo and Ade-Ibijola found consistent benefits across the literature. Chatbots centralized course materials, provided instant responses, handled multiple users at once, and improved engagement, particularly on mobile devices. The authors note:
the usage of Chatbots allows for the gathering of various forms of in-formation and storage in a unit (Information unit) for rapid and easy access by authorised users. Furthermore, Chatbots encourage personalised learning, provide instant support to users, and allow multiple users to access the same information at the same time. (p. 8)
Every one of those benefits is amplified in the current generation of AI tools. A student in 2021 could ask a chatbot a pre-programmed question and get a scripted answer. A student in 2026 can ask ChatGPT to explain a concept three different ways, generate practice problems, and give feedback on a draft, all in one conversation. The accessibility advantage that early chatbots offered has expanded into something qualitatively different.
But I think this is also where we need to be careful. Accessibility and engagement are good things. They’re also insufficient on their own. Guo et al. (2025) found in their year-long classroom study that the pedagogical design around ChatGPT determined whether students actually learned from it. Students valued AI as a supplement but rejected it as a replacement for human instruction. The tool’s availability didn’t guarantee learning. The task design did. And the lesson applies directly to the chatbot literature Okonkwo and Ade-Ibijola reviewed: availability is a starting point, not an outcome.
The Challenges They Identified Haven’t Gone Away
The most valuable part of this review, from my perspective, is its treatment of challenges. The authors identified ethical concerns (privacy, transparency, trust, agent persona), evaluation gaps (small samples, limited testing), user attitudes that shape adoption, and technical limitations in natural language processing. They write:
It is clear from the re-view that some factors, such as ethical, evaluation, user attitude, supervision, and maintenance issues, may have an impact on the adoption and use of Chatbots in education. This implies that these factors may skew users’ perceptions, limiting the applications of Chatbots systems in educational settings. To improve the penetration of Chatbot technology in education, researchers and stakeholders must define adequate solutions that can mitigate the negative effects of these challenges. (p. 8)
Four years later, every one of those challenges persists, and several have intensified. Privacy concerns are more urgent now that students share personal writing, emotional reflections, and academic work with commercial AI systems. Transparency matters more when the AI can produce convincing but inaccurate text. Trust is more complicated when tools feel authoritative but lack understanding.
Shaw and Nave (2026) identified cognitive surrender, the tendency for students to defer to AI outputs without critical evaluation, as a growing risk. Cognitive surrender wasn’t visible in the pre-ChatGPT literature, but the seeds were there in the trust and user attitude challenges this review documented.
Evaluation remains a weak spot too. Gerlich (2025) showed that heavy AI reliance weakens critical thinking through cognitive offloading. Fan et al. (2025) found that AI improved essay quality but not knowledge transfer, because students skipped the metacognitive work. These findings echo what Okonkwo and Ade-Ibijola flagged about uneven evaluation practices. We’re still not measuring the right things consistently enough.

What This Tells Us About Building AI Pedagogy
I keep returning to the same conviction: the technology matters less than what we do with it. This review, covering chatbots that are primitive compared to current tools, identified the same core tension that defines the AI-in-education conversation today. The tools offer genuine benefits. They also carry genuine risks. And the difference between a positive outcome and a negative one comes down to design, intentionality, and pedagogical judgment.
Celik (2023) showed that technical knowledge alone doesn’t predict successful AI integration. Pedagogical judgment and ethical reasoning are stronger predictors. Mishra et al. (2023) argued that the TPACK framework needs updating because generative AI is qualitatively different from previous educational technologies. Okonkwo and Ade-Ibijola’s review, written before generative AI arrived, already pointed in this direction. The challenges they documented weren’t technical failures. They were design failures, implementation failures, and governance failures.
If you’re a teacher trying to figure out how to use AI in your classroom, this review is a useful reminder that the conversation didn’t start with ChatGPT. Researchers have been studying these dynamics for years, and the patterns are consistent. The tools get better. The questions stay the same. Are students learning? Are we protecting their privacy? Are we designing tasks that develop thinking, or just making information delivery faster?
Those questions deserve serious attention, and they’ll remain relevant long after today’s tools are replaced by whatever comes next.
References
- Celik, I. (2023). Towards Intelligent-TPACK: An empirical study on teachers’ professional knowledge to ethically integrate artificial intelligence (AI)-based tools into education. Computers in Human Behavior, 138, 107468. https://doi.org/10.1016/j.chb.2022.107468
- Guo, F., Li, T., & Cunningham, C. J. L. (2025). One year in the classroom with ChatGPT: Empirical insights and transformative impacts. Frontiers in Education, 10, 1574477. https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2025.1574477/full
- Mishra, P., Warr, M., & Islam, R. (2023). TPACK in the age of ChatGPT and generative AI. Journal of Digital Learning in Teacher Education, 39(4), 235–251. https://doi.org/10.1080/21532974.2023.2247480
- Okonkwo, C. W., & Ade-Ibijola, A. (2021). Chatbots applications in education: A systematic review. Computers and Education: Artificial Intelligence, 2, 100033. https://doi.org/10.1016/j.caeai.2021.100033
- Shaw, S. D., & Nave, G. (2026). Thinking fast, slow, and artificial: How AI is reshaping human reasoning and the rise of cognitive surrender. Working paper, The Wharton School, University of Pennsylvania. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646
