What AI Writing Tools Actually Do to Your Ideas

I’ve been telling educators for years that AI is a tool, and that tools are only as good as the pedagogy behind them. I still believe that. But a new study from Google DeepMind forces a harder question: what if the tool changes what you’re trying to say, even when you don’t notice?

Abdulhai et al. (2026) ran three separate studies on how LLMs alter human writing, and their findings go well beyond tone or style. When people use AI writing tools heavily, the resulting text shifts in semantic meaning, argumentative stance, emotional register, and grammatical structure. All of it moves in the same direction, across different models and different tasks. The writers lose their voice. And the strange part is that most of them don’t mind.

AI Writing Tools and the Satisfaction Paradox

In a randomized controlled trial with 100 participants, Abdulhai et al. asked people to write argumentative essays on whether money leads to happiness. Half the group had access to an LLM (gpt-4o-mini); the other half wrote without AI. The researchers then split the AI group further: those who used the model lightly (for ideas or feedback) and those who relied on it heavily to generate text.

Heavy users reported their essays felt significantly less creative and less in their own voice. That’s a self-aware group of writers. But here’s what complicates things: they were just as satisfied with the final product as participants in the control group. They recognized the loss and accepted it anyway.

I’ve written about this dynamic before. Shaw and Nave (2026) described a similar pattern with cognitive surrender, where people gradually hand over reasoning to AI not because they can’t think, but because the output feels good enough. Abdulhai et al. give that pattern empirical teeth in the writing domain. The essay got better. The writer didn’t grow.

The data on argumentative stance is just as telling. Compared to the control group, heavy LLM use produced a 68.9% increase in essays that took a neutral position on the topic. The essays hedged. They balanced both sides neatly and committed to neither. That’s what LLMs do when left in charge of production: they flatten opinions into safe, sanitized middle ground. If you’re asking an AI to help you argue, you might end up not arguing at all.

AI Writing Tools

Even Grammar Edits Change What You Mean

The second study is the one that unsettled me most. Abdulhai et al. used a pre-ChatGPT dataset of 86 argumentative essays collected in 2021 (ArgRewrite-v2) and asked three production LLMs to revise them based on expert human feedback. They tested five types of revision prompts: general revision, minimal edits, grammar corrections, completion, and expansion.

When humans revised their own work, the changes were small, targeted, and scattered in different semantic directions. Each writer adjusted their own text in their own way. LLM revisions looked nothing like that. They were large, uniform, and pointed in the same direction across all 86 essays. The models didn’t just clean up prose. They pulled every essay toward a shared semantic center, a default AI mode of expression.

And this happened even when the models were told to fix grammar only. Abdulhai et al. found that grammar-only prompts still produced significant semantic drift. The models changed conclusions, reframed arguments, and altered the emotional tone of the essays. A grammar edit is supposed to fix a comma splice, not rewrite your thesis.

The lexical analysis tells a vivid story. Human-revised essays kept words like “people,” “society,” “money,” “want.” LLM-revised essays introduced “deployment,” “frameworks,” “regulatory,” “robust,” “policymakers.” The models stripped away the personal and replaced it with institutional language that no student would naturally use.

Fan et al. (2025) found something parallel in their process mining research: AI improved the product without improving the process. Abdulhai et al. extend that finding into new territory. AI doesn’t just skip the learning. It overwrites the voice.

The Grammar of AI Writing Tools

The structural patterns are consistent across both studies. LLM-edited writing showed a 50% drop in pronouns, a 14-33% increase in nouns, and a 57-90% increase in adjectives. The writing moves away from first-person, experience-based language toward impersonal, formal, noun-heavy prose.

Abdulhai et al. also found that LLMs simultaneously increase emotional and analytical language, a combination that doesn’t occur naturally in human writing. LLM-edited text scores higher on formal analytical measures and relies on statistical and expert-opinion arguments. Human writers lean on personal experience and anecdotal reasoning. The models strip out the personal and replace it with authoritative-sounding generality.

The authors make a provocative connection to RLHF training. Because LLMs are optimized to produce text that earns positive feedback at scale, they’re incentivized to write in ways that are broadly convincing to most people, with no mechanism for preserving any individual’s voice. Abdulhai et al. compare this to clickbait: optimizing for engagement warps the content itself.

What the ICLR Peer Reviews Reveal About AI and Scientific Institutions

The third study moves from student essays to scientific peer review. At ICLR 2026, 21% of 75,000 reviews were flagged as LLM-generated. Abdulhai et al. report that these reviews scored papers a full point higher on average (4.43 vs 4.13 for human reviewers). That alone is concerning. But the deeper finding is about criteria.

LLM-generated reviews were 32% less likely to comment on clarity as a strength, 58% less likely to flag clarity as a weakness, and 32% less likely to mention the relevance of the research. They were 136% more likely to focus on reproducibility and 84% more likely to flag scalability. The models aren’t just grading differently. They’re evaluating science by different standards altogether.

I covered Hartzog and Silbey’s (2025) analysis of how AI corrodes institutional trust, and the ICLR data is a concrete example of that corrosion in action. When a fifth of peer reviews are AI-generated, and those reviews prioritize technical checkboxes over clarity, impact, and relevance, the criteria for what counts as good science start to shift. Nobody voted on that change. It happened because the tools made it easy.

What Educators Should Take From This

One finding from Abdulhai et al. offers something genuinely useful for teaching. Users who consulted the LLM only for ideas or light feedback, the “LLM-influenced” group, produced essays that looked almost identical to the human control group in both semantic space and self-reported creativity. The distortion kicks in when people hand over the actual writing to the model.

That’s a design principle, not just a finding. Teachers can build assignments that allow AI for brainstorming, outlining, and feedback, and require students to write the actual text themselves. The line between using AI as a thinking partner and using it as a ghostwriter is not theoretical. Abdulhai et al. have drawn it empirically.

Kosmyna et al. (2025) showed that students who thought independently before using ChatGPT produced stronger neural engagement and better outputs. The pattern across studies is becoming clearer with each new paper: AI helps most when it stays in a supporting role. The moment it takes over production, the writer’s voice, reasoning, and originality begin to fade.

The technology will keep getting smoother. Every draft will come out more polished. But polished is not the same as yours.

References

  • Abdulhai, M., White, I., Wan, Y., Qureshi, I., Leibo, J., Kleiman-Weiner, M., & Jaques, N. (2026). How LLMs distort our written language. arXiv preprint arXiv:2603.18161. https://arxiv.org/abs/2603.18161
  • Fan, Y., Tang, L., Le, H., Shen, K., Tan, S., Zhao, Y., Shen, Y., Li, X., & Gašević, D. (2025). Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. British Journal of Educational Technology, 56(2), 489–530. https://doi.org/10.1111/bjet.13544 
  • Hartzog, W., & Silbey, J. (2025). How AI destroys institutions [Draft]. Boston University School of Law. : https://scholarship.law.bu.edu/faculty_scholarship/4179 
  • Kosmyna, N., Hauptmann, E., Yuan, Y. T., Situ, J., Liao, X.-H., Beresnitzky, A. V., Braunstein, I., & Maes, P. (2025). Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing tasks. MIT Media Lab. https://www.media.mit.edu/publications/your-brain-on-chatgpt/    
  • Shaw, S. D., & Nave, G. (2026). Thinking fast, slow, and artificial: How AI is reshaping human reasoning and the rise of cognitive surrender. Working paper, The Wharton School, University of Pennsylvania. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646 

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top