AI for Peer Review and Formative Assessment in Writing Classes

I’ve said this before and I’ll keep saying it: the best AI pedagogy doesn’t start with the tool. It starts with the learning goal. And when it comes to writing instruction, one of the most persistent goals has always been getting students meaningful feedback on their drafts before the final submission.

Sperber, MacArthur, Minnillo, Stillman, and Whithaus (2025) built a model that does exactly this. PAIRR, which stands for Peer and AI Review + Reflection, combines peer review, AI feedback, and structured reflection into a single formative assessment sequence. The study involved 654 students across writing and writing-intensive STEM courses, and the results offer a clear picture of what happens when AI is integrated into feedback with intention.

PAIRR offers a practical framework to use AI in assessment without losing the human dimension. Let me walk through why.

How PAIRR Works as a Formative Assessment Model

The sequence is key in this model. Students write a full draft, receive peer feedback, then receive AI feedback, reflect on both, and revise. AI comes after the draft and after peer review, not before. By the time students see what AI has to say, they’ve already done the cognitive work of producing ideas and already have a human perspective to weigh it against.

In its current format, I would say that the framework has the potential to protect against what Shaw and Nave (2026) identified in their cognitive surrender research: students deferring to AI outputs without critical evaluation. PAIRR structures the encounter so students evaluate AI through a lens they’ve built themselves. Peer feedback comes first. AI feedback is something they interrogate, not something they absorb.

And the reflection component is essential. Students don’t just read both sets of comments and move on. They compare, assess accuracy, spot conflicts, and articulate a revision plan. Feedback becomes a thinking exercise.

What Students Actually Preferred

Fifty-eight percent of students preferred combined peer and AI feedback. Thirty-six percent preferred peer alone. Only 6 percent preferred AI alone.

In a study of 654 students, almost nobody wanted AI feedback without a human in the loop. Sperber et al. report that AI feedback reached high usefulness ratings only when paired with human response. Students valued AI for its organization, rubric alignment, and specific revision suggestions. But they consistently described it as general, occasionally inaccurate, and lacking assignment context.

AI for Peer Review and Formative Assessment

Peer feedback earned praise for different reasons: contextual awareness, emotional tone, and what one student described as “an insider perspective, with the knowledge that was given from the assignment and the class” (p. 10). Another put it simply: “humanity behind the feedback” (p. 10).

This reinforces something I’ve been arguing across my writing. AI works best as a complement to human interaction. Guo et al. (2025) found the same pattern in their year-long classroom study. Students valued ChatGPT as a supplement but firmly rejected it as an instructor substitute. PAIRR gives us a concrete model for what that supplementary role actually looks like.

Students Were Critical of AI, and That’s the Point

Twenty-five percent of students explicitly disagreed with AI feedback. Another 25 percent noted inaccuracies. Sperber et al. treat this as a positive finding, and I agree. Students were questioning AI, comparing it against peer feedback, making independent decisions about what to revise and what to ignore. Not passively absorbing output.

One student described AI as helpful for building “the skeleton” of writing, but insisted “the meat of it should be written by a person” (p. 12). Another reflected on the need to evaluate AI feedback “for relevance and correctness” (p. 11). These are the kinds of critical AI literacy habits Roe, Furze, and Perkins (2025) called for in their CAIL framework. PAIRR doesn’t just teach writing. It builds evaluative capacity.

Cheng et al. (2025) showed that students who asked AI direct, purposeful questions performed better on writing tasks. Agency in shaping the interaction predicted the outcome. PAIRR creates a similar dynamic. Students aren’t receiving AI feedback passively. They’re judging it.

AI for Peer Review and Formative Assessment

Sperber et al. close with three recommendations: center humans in AI-supported feedback, prioritize metacognitive reflection and student agency, and cultivate critical AI literacies. They note that automated systems “cannot judge critical thinking, rhetorical knowledge, or the ability of a writer to adapt to a given audience” (p. 4). AI can flag structural issues and suggest revisions. Voice, audience awareness, argumentative depth? Those require human judgment.

The AI Assessment Scale from Perkins, Roe, and Furze (2024) positions Level 3 as the point where students collaborate with AI but must critically evaluate everything it produces. PAIRR fits naturally there. Structured access to AI feedback, followed by the hard work of comparison, evaluation, and revision.

I want to highlight the reflection step in particular. It’s what separates PAIRR from less thoughtful integration models. Without reflection, students receive two sets of comments and pick the easier revision path. With it, they articulate why they’re making specific changes and what they’ve learned from comparing human and machine perspectives. Sperber et al. describe this as “turn[ing] our attention to students’ own critical self-reflections on these choices and processes” (Melzer, 2023, p. 13, cited in Sperber et al., 2025).

I’ve seen too many AI integration attempts that skip this step. Teachers give students access to AI feedback and assume the learning happens automatically. It doesn’t. The learning happens when students have to explain their decisions, justify their revisions, and account for disagreements between sources. PAIRR bakes that into the process.

For anyone looking for a concrete, research-backed way to bring AI into writing instruction, PAIRR is a strong starting point. Human-centered, structured, and built to develop the critical habits students need.

Reference

  • Cheng, Y., Fan, Y., Li, X., Chen, G., Gašević, D., & Swiecki, Z. (2025). Asking generative artificial intelligence the right questions improves writing performance. Computers and Education: Artificial Intelligence, 8, 100374. https://doi.org/10.1016/j.caeai.2025.100374
  • Guo, F., Li, T., & Cunningham, C. J. L. (2025). One year in the classroom with ChatGPT: Empirical insights and transformative impacts. Frontiers in Education, 10, 1574477. https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2025.1574477/full
  • Melzer, D. (2023). Reconstructing response to student writing: a national study from across the curriculum. Utah State University Press. https://doi.org/10.7330/ 9781646423682
  • Perkins, M., Roe, J., & Furze, L. (2024). The AI Assessment Scale revisited: A framework for educational assessment (Preprint). December 2024. https://arxiv.org/abs/2412.09029
  • Perkins, M., & Roe, J. (2025). The end of assessment as we know it: GenAI, inequality and the future of knowing. In AI and the future of education: Disruptions, dilemmas and directions (pp. 76–80).  https://durham-repository.worktribe.com/output/4472558/the-end-of-assessment-as-we-know-it-genai-inequality-and-the-future-of-knowing

  • Shaw, S. D., & Nave, G. (2026). Thinking fast, slow, and artificial: How AI is reshaping human reasoning and the rise of cognitive surrender. Working paper, The Wharton School, University of Pennsylvania. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646
  • Sperber, L., MacArthur, M., Minnillo, S., Stillman, N., & Whithaus, C. (2025). Peer and AI Review + Reflection (PAIRR): A human-centered approach to formative assessment. Computers and Composition, 76, 102921. https://doi.org/10.1016/j.compcom.2025.102921

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top