I came across this study through Matthew Connelly’s opinion piece in the New York Times and decided to look it up. Niloy et al (2024) published “Is ChatGPT a Menace for Creative Writing Ability? An Experiment” in the Journal of Computer Assisted Learning. It’s from 2024, which in AI years already feels like a different era.
The model used was ChatGPT 3.5. Compared to the current GPT-5.2, 3.5 was primitive. But I’m sharing this study for a specific reason: it was one of the earliest rigorous academic efforts to experimentally test the impact of AI on creative writing. That makes it a useful baseline, even if the technology has moved dramatically since then.
What the Study Actually Tested
Six hundred university students were split into a control group and an experimental group. A pre-test confirmed no significant difference between them (p = 0.682). Both groups wrote essays under the same conditions. In the post-test, only the experimental group had access to ChatGPT 3.5.
Creativity was measured through four components: content accuracy, content presentability, elaboration, and similarity. Similarity functioned inversely, meaning higher similarity to existing text lowered the creativity score. Human reviewers assessed accuracy, presentability, and elaboration. Machine tools like Turnitin and Grammarly measured similarity. A composite “Total Creativity Score” combined all four.
After using ChatGPT, the experimental group’s Total Creativity Score dropped significantly (p = 0.000). The control group showed no change.
Niloy et al. report: “The findings of this experiment highlight a significant decline in students’ creative writing ability as a result of using ChatGPT” (p. 927). Accuracy and originality both declined: “the EGp’s accuracy in the produced content dropped significantly and the generated content lost its originality compared to the state when ChatGPT did not intervene” (p. 927).
Two dimensions did improve, though. Niloy et al. acknowledge “positive changes in factors – Elaboration and Presentability” (p. 927). ChatGPT helped students expand their ideas and structure content more clearly.
So the trade-off is specific: clarity and elaboration went up, but accuracy and originality went down. The losses outweighed the gains.

How I Read These Results in 2026
I’m a strong advocate for AI in teaching and learning. I’ve argued for embracing it widely, aggressively, and unapologetically. Studies like this one are important precisely because they push back on uncritical enthusiasm and force us to ask harder questions about how AI is used in writing instruction.
I also think we need to contextualize these findings carefully. ChatGPT 3.5 was a blunt instrument. Limited ability to follow nuanced instructions, maintain voice, or produce original arguments. The students in this experiment were given access to a tool and left to figure it out. No scaffolding. No reflection. No pedagogical structure around the AI interaction. Nobody trained them to evaluate AI output, push back on it, or use it as a thinking partner.
Recent research shows those conditions change everything. Cheng et al. (2025) found that students who asked AI direct, purposeful questions performed significantly better on writing tasks. Agency in shaping the interaction predicted the outcome. Sperber et al.’s (2025) PAIRR model demonstrated that when AI feedback is embedded in peer review and structured reflection, students become critical evaluators, not passive recipients.
Kosmyna et al. (2025) found at MIT that ChatGPT use reduces neural engagement and produces linguistically homogeneous writing. But here’s the key detail: students who did independent thinking before using AI produced stronger, more diverse outputs. Sequence matters. How you approach AI shapes what you get from it.
Shaw and Nave (2026) gave this phenomenon a name: cognitive surrender. When students defer to AI without critical evaluation, thinking quality declines. Niloy et al.’s results fit that pattern perfectly. Students let ChatGPT drive the writing process, and originality dropped.
The Bigger Question
Can we treat a 2024 study using ChatGPT 3.5 as definitive evidence about AI and creative writing in 2026? I don’t think so. Current models are far more capable of nuanced, context-aware responses. Pedagogical research has also advanced. We now have structured models like PAIRR and frameworks like the AI Assessment Scale (Perkins, Roe, & Furze, 2024) that give teachers concrete ways to integrate AI with intentionality.
But the core insight from Niloy et al. holds up. Passive AI use, without critical evaluation, without pedagogical structure, hurts creativity. That was true with ChatGPT 3.5. It’s still true with GPT-5.2. The tool has changed. The human tendency toward cognitive shortcuts hasn’t.
Kalantzis and Cope (2025) argued that literacy in the age of AI should be understood as design agency: the active, intentional work of making meaning. If writing is something students do with purpose and voice, AI becomes a tool they wield. If AI becomes the author, we lose exactly what Niloy et al. measured: originality, accuracy, and creative depth.
So, yes we still need studies like this one. Rigorous, experimental, willing to measure what actually happens when students use AI to write. But those studies also need to account for pedagogy. Does AI affect creativity? Sure. But under what conditions? With what scaffolding? With what level of student agency?
Those are the questions worth pursuing. And they should guide every educator thinking about AI in their writing classroom. Niloy et al. gave us a starting point. Two years later, we have better tools and better pedagogy. The challenge now is making sure they work together.
Reference
- Cheng, Y., Fan, Y., Li, X., Chen, G., Gašević, D., & Swiecki, Z. (2025). Asking generative artificial intelligence the right questions improves writing performance. Computers and Education: Artificial Intelligence, 8, 100374. https://doi.org/10.1016/j.caeai.2025.100374
- Connelly, M. (2026, February 12). A.I. companies are eating higher education. The New York Times. https://www.nytimes.com/2026/02/12/opinion/ai-companies-college-students.html
- Kalantzis, M., & Cope, B. (2025). Literacy in the time of artificial intelligence. Reading Research Quarterly, 60, e591. https://doi.org/10.1002/rrq.591
- Kosmyna, N., Hauptmann, E., Yuan, Y. T., Situ, J., Liao, X.-H., Beresnitzky, A. V., Braunstein, I., & Maes, P. (2025). Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing tasks. MIT Media Lab. https://www.media.mit.edu/publications/your-brain-on-chatgpt/
- Niloy, A. C., Akter, S., Sultana, N., Sultana, J., & Rahman, S. I. U. (2024). Is ChatGPT a menace for creative writing ability? An experiment. Journal of Computer Assisted Learning, 40(2), 919–930. https://doi.org/10.1111/jcal.12929
- Perkins, M., Roe, J., & Furze, L. (2024). The AI Assessment Scale revisited: A framework for educational assessment (Preprint). December 2024. https://arxiv.org/abs/2412.09029
- Shaw, S. D., & Nave, G. (2026). Thinking fast, slow, and artificial: How AI is reshaping human reasoning and the rise of cognitive surrender. Working paper, The Wharton School, University of Pennsylvania. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646
- Sperber, L., MacArthur, M., Minnillo, S., Stillman, N., & Whithaus, C. (2025). Peer and AI Review + Reflection (PAIRR): A human-centered approach to formative assessment. Computers and Composition, 76, 102921. https://doi.org/10.1016/j.compcom.2025.102921
