The pedagogy-first argument I keep making on this blog just got a serious empirical anchor. Zhang, Zhang, and Lu’s (2026) meta-analysis in Educational Research Review pulls together 96 effect sizes from 56 studies and asks one question that the chatbot literature has been ducking for years: how much does the pedagogical approach actually shape whether chatbots help students learn? The answer is a lot. And the way they answer it changes how I’d talk to teachers about chatbot integration going forward.
The Hierarchy of Effects Is Striking
The overall effect of chatbot-assisted learning on student performance came out at g = 0.65, which is solidly medium. That’s the headline number, and it lines up with previous syntheses. The more interesting part is what happens when you break that overall effect down by pedagogical approach. Zhang et al. (2026) report inquiry-based learning at g = 0.87, situated learning at 0.81, problem-based learning at 0.72, project-based learning at 0.59, collaborative learning at 0.57, and game-based learning at 0.45.
This is a real spread. Inquiry-based learning produces almost twice the effect of game-based learning. The chatbots are roughly comparable across the studies. So are the student populations and subject domains. The variable that moves is pedagogy.

Why Pedagogy Drives the Effect Size
The central argument is structural. Zhang et al. (2026) write that “the magnitude of effects appears to reflect the degree of alignment between chatbot affordances and each pedagogy’s core pedagogical demands” (p. 18).
Inquiry-based learning runs on question formulation, evidence-based explanation, and iterative reasoning. Those are exactly the moves a dialogic chatbot can scaffold well, especially a post-GAI model like ChatGPT or Claude. Project-based learning, by contrast, often produces multimodal artifacts: videos, prototypes, design objects. A text-based chatbot can’t fully evaluate those, which constrains the effect.
Game-based learning ends up at the bottom of the hierarchy for a related reason. Chatbot dialogue competes with gameplay for attentional resources, and the field has no settled instructional model for embedding chatbot scaffolding inside game mechanics.
Mishra, Warr, and Islam’s (2023) work on TPACK in the age of ChatGPT, which I’ve covered before, predicted exactly this kind of misalignment. When the pedagogical content knowledge and the technological affordance don’t fit, the technology drifts toward decoration.
The Within-Pedagogy Patterns Tell a Second Story
Beyond the cross-pedagogy comparisons, the within-pedagogy patterns are worth attention. Within problem-based learning, upper-secondary students showed stronger gains than tertiary students. Zhang et al. (2026) read this as a compensatory effect: younger learners still developing self-regulation get a bigger lift from the cognitive scaffolding chatbots provide, while tertiary students already carry those strategies into the task. The same chatbot does heavier lifting in a 16-year-old’s biology PBL unit than it does in a graduate student’s clinical reasoning course.
Within inquiry-based learning, post-GAI chatbots outperformed pre-GAI rule-based systems, and chatbots playing dual roles as Conversational Partner plus Resource Provider yielded stronger effects than those combining Resource Provider plus Feedback Provider.
The dialogic combination wins. That tracks with what Sperber et al. (2025) found in their PAIRR study on AI peer feedback in writing classrooms, which I’ve covered before. Conversation that builds on dialogue beats evaluative feedback dropped on top of resource access.
Where I’d Take the Argument Further
The paper has limits worth naming. The evidence base is concentrated in 2023-2025 studies, when ChatGPT was new and instructors were still figuring out integration. Several subgroup categories had fewer than five studies, which the authors flag clearly. The synthesis is also restricted to English-language publications, and it treats learning performance as a single construct without separating knowledge acquisition, skill development, and higher-order thinking. Those are real constraints on what we can confidently generalize.
I’d also extend the conceptual move the paper makes. Zhang et al. (2026) work hard to clean up what counts as a pedagogical approach, calling out earlier reviews that treated learning activities like role-play or exercises as pedagogies. They’re right to do this, and the field has needed it for a while. The same critique applies to how AI tools are being branded. “AI tutoring,” “AI companion,” “AI feedback” aren’t pedagogies. They’re features that need a pedagogy around them to function.
The Real Argument Is About Design
The strongest line in the paper is the warning. Zhang et al. (2026) write that “chatbot effectiveness in CAL is not uniform but contingent on how well scaffolding functions align with a pedagogy’s core instructional mechanisms. Without such alignment, chatbots risk being used as answer-generation tools rather than supports for knowledge construction” (p. 18).
That’s the pedagogy-first thesis, stated cleanly. Bastani et al. (2025) showed exactly what happens when guardrails aren’t built around the AI: the tool harms learning. I’ve covered that paper too. The Zhang meta-analysis gives us the positive case. With aligned pedagogy, chatbots help. Without it, they’re just answer machines.
The connection the authors draw to Universal Design for Learning and personalized learning is also worth taking seriously. They position chatbot-assisted learning as a personalization tool, not a generic add-on. That framing has implications for how teachers select, configure, and integrate chatbots into their actual classroom contexts.
The Closing Reframe
Zhang et al. (2026) end with a reformulation of the old Clark-Kozma debate about whether media influences learning. They propose that “Rather than asking ‘Do media influence learning?’, a more relevant question may be: ‘How can chatbot technology enrich teaching and learning, given varying pedagogical contexts?'” (p. 24). I’d go further. The question isn’t whether AI works in education. It’s whether the pedagogy around the AI works. The chatbot is downstream of that decision.
Start with the pedagogy. Build the chatbot’s role around it. The scaffolding has to match what the pedagogy actually demands. The tool follows.
References
- Bastani, H., Bastani, O., Sungu, A., Ge, H., Kabakcı, Ö., & Mariman, R. (2025). Generative AI without guardrails can harm learning: Evidence from high school mathematics. Proceedings of the National Academy of Sciences, 122(26), e2422633122. https://doi.org/10.1073/pnas.2422633122
- Mishra, P., Warr, M., & Islam, R. (2023). TPACK in the age of ChatGPT and generative AI. Journal of Digital Learning in Teacher Education, 39(4), 235–251. https://doi.org/10.1080/21532974.2023.2247480
- Sperber, L., MacArthur, M., Minnillo, S., Stillman, N., & Whithaus, C. (2025). Peer and AI Review + Reflection (PAIRR): A human-centered approach to formative assessment. Computers and Composition, 76, 102921. https://doi.org/10.1016/j.compcom.2025.102921
- Zhang, Q., Zhang, N., & Lu, C. (2026). How do pedagogical approaches affect the impact of chatbots on learning performance? A meta-analysis and research synthesis. Educational Research Review, 51, Article 100783. https://doi.org/10.1016/j.edurev.2026.100783
