AI in Academic Writing: A Three-Tier Framework for Ethical Use

The gap between what scholars actually do with ChatGPT and what institutions say they should do keeps widening. Cheng, Calhoun, and Reedy (2025), writing from the healthcare simulation field, step into that gap with a paper that tries to sort AI uses in academic writing into ethical categories, and they do it with refreshing transparency: they used ChatGPT to help write the paper itself, then disclosed the entire process.

The result is a three-tier framework that I find genuinely useful, even as I think parts of it already need updating.

What the Three Tiers Look Like

Cheng et al. organize AI uses in academic writing into three ethical tiers, and the organizing logic is simple: the closer AI gets to generating original thought, the higher the ethical risk.

Tier 1, which they label “ethically acceptable,” covers grammar, spelling, readability, and language translation. These are tasks where AI restructures what the researcher already wrote. The intellectual contribution stays with the human. Cheng et al. note that authors should still double-check the refined text to confirm it reflects their own voice and critical thinking, and they point out that dedicated writing assistants like Grammarly may be better suited for these tasks than general-purpose LLMs.

Tier 2, “ethically contingent,” includes generating outlines from existing content, summarizing drafted material, improving clarity, and brainstorming ideas. These uses carry real risk because the AI is now producing novel text, not just polishing what exists. Cheng et al. stress that Tier 2 is defensible only when the researcher actively reviews, reshapes, and takes ownership of the output. The final manuscript must reflect the author’s own ideas. If the AI’s suggestions change the key meaning or message, something has gone wrong.

Tier 3, “ethically suspect,” covers the territory where I think most of the real violations happen: drafting original text without providing source content, developing new concepts, interpreting data, conducting literature reviews, and checking for plagiarism. Cheng et al. argue that using AI for these tasks deprives the researcher of the deep engagement with source material that produces genuine understanding. I’ve covered a version of this argument before through the work of Shaw and Nave (2026), whose concept of cognitive surrender describes what happens when thinkers routinely outsource the effortful stages of reasoning to AI.

AI in Academic Writing

The Reference Problem Is Worse Than Most Researchers Realize

One of the most striking sections of the paper presents data on AI-generated references. Cheng et al. cite a study by Athaluri et al. in which ChatGPT was asked to produce 50 research protocols with references. Among those references, 38% had wrong or fabricated DOIs, and 16% were completely fabricated. A second study found that only 7% of ChatGPT-generated references were fully authentic and accurate (Bhattacharyya et al., cited in Cheng et al., 2025).

Those numbers should alarm anyone who has been tempted to let ChatGPT handle a reference list. The tool generates grammatically perfect citations that point to articles that do not exist. And because the format looks right, the fabrication can survive multiple rounds of editing if the author doesn’t manually verify every entry.

Cheng et al. are clear about the implication: they caution against the sole use of LLM-based generative AI to write content for medical abstracts, articles, or research proposals, and they explicitly recommend against using AI to generate references. I agree fully on the references point. AI-generated citations are a liability, not a shortcut.

Why the Cognitive Dependency Argument Matters Most

The section of the paper that carries the most weight for me is the one about cognitive dependency. Cheng et al. warn that “if scholars become too dependent on ChatGPT for ideation, generation of primary written content, and initial data interpretation, as this assistance could easily devolve into a dependency that arrests further scholarly development” (p. 5).

This concern tracks with a growing body of evidence I’ve been writing about on this blog. Fan et al. (2025) showed that students using ChatGPT produced better essays but demonstrated no learning gains compared to students writing independently. The AI improved the product without improving the process. Kosmyna et al. (2025), using EEG data, found that ChatGPT reduced neural engagement during writing tasks, measurably lowering the cognitive effort that produces deep learning.

Cheng et al. frame this specifically around novice researchers, and that’s an important move. Experienced scholars who have already developed strong research and writing skills can use AI as a genuine supplement. A doctoral student who hasn’t yet learned to build an argument from raw data is in a fundamentally different position. For that student, AI shortcuts don’t save time. They eliminate the training.

What the Framework Gets Right and Where It Falls Short

The three-tier model works well as a first-pass heuristic. It gives researchers a way to think about their own AI use without reducing everything to a blanket yes or no. The four-question checklist Cheng et al. propose is equally practical: Are the primary ideas mine? Am I maintaining my research competency? Have I verified accuracy? Have I disclosed my AI use? Any researcher who works through those questions with real candor before submitting a manuscript is already ahead of most institutional policies.

The paper also models something I wish more scholars would do: practicing what they preach on disclosure. Cheng et al. used ChatGPT to generate initial recommendations, then critically evaluated and supplemented those recommendations with their own expertise. They describe the entire process in their acknowledgements section. I’ve written about disclosure norms through the work of Cleland et al. (2025), and Cheng et al.’s transparency here is exactly the standard the field needs.

Where the framework falls short is in its treatment of Tier 2 as a stable middle ground. The line between “contingent” and “suspect” depends entirely on how much the researcher engages with the AI output, and there’s no way to verify that from the outside. An outline generated from a detailed prompt containing the researcher’s own ideas is fundamentally different from an outline generated from a vague topic sentence. Both fall under Tier 2 in Cheng et al.’s model, but the ethical distance between them is enormous.

I’d also note that the paper focuses on ChatGPT and similar general-purpose LLMs. Cheng et al. acknowledge this limitation themselves, pointing out that academic-specific tools like Scopus AI, which draw from curated databases of peer-reviewed content, may carry different risk profiles. The ethical tiers will need revision as these specialized tools become the norm. What counts as “ethically suspect” with a general-purpose LLM might be perfectly reasonable with a tool built specifically for academic research.

What This Means for Researchers Right Now

Cheng et al. conclude that “the real challenge is defining when and how to optimally use generative AI, and how to ethically manage the nuances of using AI in the academic writing process” (p. 8). I’d add that the challenge isn’t just individual. Institutions need to stop treating AI in academic writing as a binary, permitted or prohibited, and start building the kind of tiered guidance that Cheng et al. propose.

The framework isn’t perfect. But it’s a starting point that respects both the reality of how researchers work today and the integrity standards that make academic publishing worth trusting.

References

  • Cheng, A., Calhoun, A., & Reedy, G. (2025). Artificial intelligence-assisted academic writing: Recommendations for ethical use. Advances in Simulation, 10(22), 1–9. https://doi.org/10.1186/s41077-025-00350-6
  • Cleland, J., Driessen, E., Masters, K., Lingard, L., & Maggio, L. A. (2025). When and how to disclose AI use in academic publishing: AMEE Guide No. 192. Medical Teacher. https://doi.org/10.1080/0142159X.2025.2607513
  • Fan, Y., Tang, L., Le, H., Shen, K., Tan, S., Zhao, Y., Shen, Y., Li, X., & Gašević, D. (2025). Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. British Journal of Educational Technology, 56(2), 489–530. https://doi.org/10.1111/bjet.13544 
  • Kosmyna, N., Hauptmann, E., Yuan, Y. T., Situ, J., Liao, X.-H., Beresnitzky, A. V., Braunstein, I., & Maes, P. (2025). Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing tasks. MIT Media Lab. https://www.media.mit.edu/publications/your-brain-on-chatgpt/  
  • Shaw, S. D., & Nave, G. (2026). Thinking fast, slow, and artificial: How AI is reshaping human reasoning and the rise of cognitive surrender. Working paper, The Wharton School, University of Pennsylvania. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646 

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top