Are AI Chatbots Ready to Teach? What 74 Studies Say About Pedagogical Roles in Education

I recently wrote about Okonkwo and Ade-Ibijola’s (2021) systematic review, which mapped where chatbots show up in education and what benefits and challenges they bring. Their review covered the field at a broad level. Wollny et al. (2021) ask a sharper question: can chatbots actually take on meaningful pedagogical responsibility? Their answer, based on 74 empirical studies, is candid. “No, we are not there yet!” (p. 13).

Their paper, “Are We There Yet? A Systematic Literature Review on Chatbots in Education,” published in Frontiers in Artificial Intelligence, analyzes the field across five dimensions: implementation objectives, pedagogical roles, mentoring capacity, adaptation, and application domains. Reading it now, in 2026, after ChatGPT and Claude and Gemini have reshaped expectations entirely, the review serves as a useful baseline for measuring how far the field has actually come and where the gaps still persist.

Most Chatbots Still Focus on Content Delivery

Wollny et al. found four major implementation objectives across the literature. Skill improvement leads at 32% of studies, with chatbots helping students practice languages, programming, or subject-specific abilities. Efficiency comes next, with systems automating routine tasks or delivering quick administrative answers. Motivation and availability round out the list, covering chatbots designed to boost engagement or extend access beyond classroom hours.

The pedagogical roles tell a similar story. The authors identify three main categories. The learning role dominates, with chatbots functioning as content tutors or practice partners. The assisting role follows, with chatbots handling logistical or informational tasks. The mentoring role, which involves supporting self-regulation, reflection, or personal growth, appears far less frequently.

chatbots pedagogical roles

I think the distribution itself is revealing. In 2021, most educational chatbots were essentially delivery mechanisms. They could present content, quiz students, and answer factual questions. Very few attempted anything deeper. And even now, with far more sophisticated tools available, I’d argue the same pattern holds. Most AI use in education still clusters around content delivery and efficiency. The harder work of supporting metacognition, self-regulation, and genuine intellectual development receives much less attention.

Fan et al. (2025) found exactly this in their metacognitive laziness study. AI improved essay quality but not knowledge gain or transfer, because students bypassed the metacognitive processes of planning, monitoring, and evaluating their own thinking. The chatbot literature that Wollny et al. reviewed was already pointing in this direction: the systems that focus on content delivery work reasonably well. The systems that attempt to support deeper learning processes struggle.

Are AI Chatbots Ready to Teach?

The Mentoring Gap

Mentoring receives special treatment in the review, and for good reason. Wollny et al. identify three mentoring methods in the literature: scaffolding (guiding learners step by step), recommending (suggesting tools, partners, or resources), and informing (providing insight into learners’ progress). All three are present in the studies, but none comes close to the complexity of human mentoring.

As the authors put it: “Comparing the current mentoring of chatbots reported in the literature with the daily mentoring role of teachers, we can summarize that the chatbots are not at the same level” (p. 12).

I find this both unsurprising and important. Mentoring requires reading emotional cues, understanding individual context, building trust across sustained relationships, and knowing when to push and when to pull back. Even current generative AI tools, which are vastly more conversational than the chatbots in this review, can’t reliably do any of those things.

Kosmyna et al. (2025) showed that ChatGPT use reduces neural engagement and weakens memory formation. Students who thought independently before using AI produced better work than those who started with AI from the beginning. A tool that weakens cognitive engagement is a poor candidate for mentoring, no matter how fluent it sounds.

Celik (2023) demonstrated something similar in the Intelligent-TPACK framework: technical knowledge alone doesn’t predict effective AI integration. Pedagogical judgment matters more. The teachers who use AI well are the ones who understand where AI adds value and where human presence is irreplaceable. Mentoring is one of those irreplaceable spaces. AI might support parts of the mentoring process, like recommending resources or tracking progress, but the relational core of mentoring remains human.

Adaptation Was Almost Nonexistent

The adaptation findings surprised me. Only six studies in the entire review included meaningful adaptation, and most of those were limited to quiz settings where difficulty or feedback adjusted based on student responses. Very few systems incorporated richer learner models that accounted for goals, prior knowledge, learning preferences, or emotional state.

In 2026, adaptive AI tutoring has advanced considerably. Bastani et al.’s (2025) PNAS study showed that a guardrailed GPT Tutor, which adapted its responses to avoid giving away answers, erased the learning harm that open ChatGPT access caused. The adaptation was built into the tool’s response structure, not just its difficulty level.

Wollny et al.’s review captures a field that hadn’t yet figured out how to build pedagogically meaningful adaptation into chatbot systems. We’re closer now, but the principle the review highlights remains valid: adaptation without pedagogical grounding is just variable difficulty, not personalized learning.

What We Should Learn from the Pre-ChatGPT Baseline

I think these early reviews, both Okonkwo and Ade-Ibijola’s (2021) and Wollny et al.’s, offer something valuable for today’s conversation. They show us that the patterns we see now aren’t new. Content delivery dominated then. It still dominates. Mentoring was weak then. It’s still weak. Adaptation was underdeveloped then. It’s better now but still inconsistent.

The tools have changed dramatically. The underlying pedagogical challenges haven’t. And that tells me something important: the bottleneck was never the technology. It was always the design. The teachers and researchers who approach AI with pedagogical intentionality, who think carefully about what they want students to learn and then design AI interactions around that goal, are the ones who get results. The ones who adopt tools without that intentionality reproduce the same patterns these 2021 reviews documented: efficient content delivery with shallow learning impact.

If you’re building your AI pedagogy right now, these findings should encourage you. You don’t need the fanciest tool. You need clarity about what you want your students to think, do, and become, and the willingness to design AI use around those goals. The technology will keep evolving. The pedagogical questions will stay the same.

References

Bastani, H., Bastani, O., Sungu, A., Geb, H., Kabakcı, Ö., & Marimane, R. (2025). Generative AI without guardrails can harm learning: Evidence from high school mathematics. Proceedings of the National Academy of Sciences, 122(26), e2422633122. https://doi.org/10.1073/pnas.2422633122

Fan, Y., Tang, L., Le, H., Shen, K., Tan, S., Zhao, Y., Shen, Y., Li, X., & Gašević, D. (2025). Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. British Journal of Educational Technology, 56(2), 489–530. https://doi.org/10.1111/bjet.13544

Kosmyna, N., Hauptmann, E., Yuan, Y. T., Situ, J., Liao, X.-H., Beresnitzky, A. V., Braunstein, I., & Maes, P. (2025). Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing tasks. MIT Media Lab. https://www.media.mit.edu/publications/your-brain-on-chatgpt/

Okonkwo, C. W., & Ade-Ibijola, A. (2021). Chatbots applications in education: A systematic review. Computers and Education: Artificial Intelligence, 2, 100033. https://doi.org/10.1016/j.caeai.2021.100033

Wollny, S., Schneider, J., Di Mitri, D., Weidlich, J., Rittberger, M., & Drachsler, H. (2021). Are we there yet? A systematic literature review on chatbots in education. Frontiers in Artificial Intelligence, 4, Article 654924. https://doi.org/10.3389/frai.2021.654924

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top