Pen and Paper Assessment Won’t Save Writing Education in the AI Era

I just read Dana Goldstein’s piece in the New York Times titled “How A.I. Killed Student Writing (and Revived It).” The article documents a real shift: high school and college teachers across the country are moving writing back into the classroom, often with pen and paper or locked-down browsers, because they don’t know how else to handle generative AI. I disagree with the framing of this shift as a “revival” of student writing. I’d argue the opposite: the pen-and-paper retreat is a pedagogical step backward, and the reasons go beyond cheating.

The Validity Problem in Pen and Paper Assessment

Dawson, Bearman, Dollinger, and Boud (2024) make the central case in their paper on validity in the AI era: validity, not cheating, should drive design decisions. If you’re teaching a writing course where the explicit goal is to assess students’ grasp of writing mechanics, rhetorical strategies, and syntactic control, then in-class handwriting might be defensible.

But that’s not what most of the teachers in the NYT piece are doing. They’re teaching English literature, religious studies, freshman composition, philosophy, history. The learning outcome is rarely “write a polished essay with no AI assistance.” The outcome is usually “demonstrate understanding of the text,” “build an argument,” “engage with the ideas,” “show analytical thinking.”

Pen and Paper  Assessment

When you assess those outcomes through a pen-and-paper test of speed-handwriting under classroom conditions, you’ve shifted what you’re measuring without saying so. You’re now grading penmanship, time pressure, working-memory capacity, and resistance to distraction. The writing mechanics absorb the score. The thinking gets buried under everything else.

This is a validity construct problem (Dawson et al., 2024). The assessment is no longer measuring what the teacher claims to measure. It’s measuring something else, and treating that something else as evidence of learning. That’s a worse outcome than AI-assisted essays.

The “Writing is Thinking” Assumption

Several of the teachers quoted in the piece lean on a version of the “writing is thinking” claim. I’ll grant that writing is thinking for some learners. It is for me. But it isn’t for everyone, and the assumption that it should be is one of the less-examined biases of academic culture.

Neurodivergent learners often think through different modalities, including sketches, concept maps, audio notes, and video monologues. Some need to talk through their ideas before anything can land on paper. The research on universal design for learning has been clear about this for years, and Tai et al.’s (2023) work on assessment for inclusion makes the same point in the assessment context: when you privilege one mode of demonstration, you exclude learners who think differently. Pen-and-paper writing is one mode. Handwriting under timed conditions is another. Treating either as the universal proof of learning is a design choice.

The Accessibility Question

There’s a more concrete version of this concern that the NYT piece doesn’t address at all. Pen-and-paper tests punish students with dysgraphia, motor coordination challenges, or learning disabilities that affect handwriting. Students with carpal tunnel face the same disadvantage. The students who need accommodations under IEPs and 504 plans face a no-win situation: ask for a laptop, which singles them out and reintroduces the AI problem the teacher was trying to solve, or handwrite under conditions that punish them for reasons unrelated to the learning outcome. Neither option is good pedagogy. Both are predictable consequences of a policy designed without inclusion in mind.

The Identity Question

I noticed something while reading the article. The teachers most enthusiastic about the pen-and-paper revival are humanities and English teachers. The tone is often nostalgic. Several of them talk about returning to “real” writing, “authentic” student work, the feel of the page. I think there’s something deeper than pedagogy driving some of this.

For a generation of humanities teachers, writing was the central craft of the discipline, and the central proof of student engagement. Generative AI threatens that identity. The pen-and-paper retreat lets them keep teaching the way they always have, with the discipline boundaries intact.

I have sympathy for that. The discipline is changing, and the change is uncomfortable. But the response to disciplinary anxiety can’t be to design assessments that exclude learners and don’t measure what they claim to measure. That’s not a defence of writing. That’s a defence of a particular professional self-image.

The Job Market Reality

There’s a final argument the NYT piece sidesteps. The students in those classrooms are graduating into a labour market where AI is part of every knowledge-work job. Lawyers, doctors, marketers, teachers, and researchers all use it daily.

Would I want my own children in a class where a teacher refuses to teach them how to use AI thoughtfully and trains them to handwrite essays? No. The purpose of education is to help learners grow intellectually, develop the analytical skills to understand the world, and prepare for the lives and work ahead of them. Pen and paper trains them for none of that.

The NYT piece frames the choice as either AI-assisted writing or pen-and-paper writing. That’s the wrong frame. The actual question is: how do we design assessments that measure learning, accommodate diverse learners, and prepare students for an AI-integrated world?

The actual work isn’t a single fix. It looks like process portfolios that document how thinking developed across drafts, oral defences where students explain their work in real time, AI-disclosed assignments that include reflection on what the tool did and what the student did, authentic tasks tied to professional conditions, marking criteria co-constructed with students, and layered evidence collected across the term, not a single high-stakes handwritten essay.

These approaches don’t make for a good NYT story. They don’t have the clean visual contrast of laptops versus notebooks. But they’re the actual work of redesigning assessment for the AI era. The pen-and-paper retreat looks like a solution. It’s mostly a defence mechanism.

Writing isn’t dying. The narrow cultural model of writing as solitary, handwritten, cognitively isolated craft is dying. The teachers who recognise that are the ones doing the harder, slower, less photographable work of redesigning what learning evidence actually looks like in 2026.

References

  • Dawson, P., Bearman, M., Dollinger, M., & Boud, D. (2024). Validity matters more than cheating. Assessment & Evaluation in Higher Education, 49(7), 1005–1016.
  • Goldstein, D. (2026, April 30). How A.I. killed student writing (and revived it). The New York Times. https://www.nytimes.com/2026/04/30/us/ai-students-cheating-homework-classrooms.html
  • Tai, J., Ajjawi, R., Bearman, M., Boud, D., Dawson, P., & Jorre de St Jorre, T. (2023). Assessment for inclusion: Rethinking contemporary strategies in assessment design. Higher Education Research & Development, 42(2), 483-497. https://doi.org/10.1080/07294360.2022.2057451

Further Reading

For a deeper look at the assessment design problems behind the NYT piece, these papers are worth your time:

  • Bassett, M. A., Bradshaw, W., Bornsztejn, H., Hogg, A., Murdoch, K., Pearce, B., & Webber, C. (2026). Heads we win, tails you lose: AI detectors in education. Journal of Higher Education Policy and Management. https://doi.org/10.1080/1360080X.2026.2622146
  • Corbin, T., Bearman, M., Boud, D., & Dawson, P. (2025). The wicked problem of AI and assessment. Assessment & Evaluation in Higher Education, 1–17. https://doi.org/10.1080/02602938.2025.2553340
  • Corbin, T., Dawson, P., & Liu, D. (2025). Talk is cheap: Why structural assessment changes are needed for a time of GenAI. Assessment & Evaluation in Higher Education, 50(7), 1087–1097. https://doi.org/10.1080/02602938.2025.2503964
  • Curtis, G. J. (2025). The two-lane road to hell is paved with good intentions: Why an all-or-none approach to generative AI, integrity, and assessment is insupportable. Higher Education Research & Development, 44(8), 2151–2158. https://doi.org/10.1080/07294360.2025.2476516
  • Dawson, P., Bearman, M., Dollinger, M., & Boud, D. (2024). Validity matters more than cheating. Assessment & Evaluation in Higher Education, 49(7), 1005–1016. https://doi.org/10.1080/02602938.2024.2386662
  • Hyland, K. (2026). Writing in the AI era: Rethinking writing, research and teaching. Journal of Second Language Writing, 101302. https://doi.org/10.1016/j.jslw.2026.101302
  • Ibaibarriaga, G., Acha, J., & Perea, M. (2025). The impact of handwriting and typing practice in children’s letter and word learning: Implications for literacy development. Journal of Experimental Child Psychology, 253, 106195. https://doi.org/10.1016/j.jecp.2025.106195
  • Nieminen, J. H., & Eaton, S. E. (2024). Are assessment accommodations cheating? A critical policy analysis. Assessment & Evaluation in Higher Education, 49(7), 978–993. https://doi.org/10.1080/02602938.2023.2259632
  • Villarroel, V., Boud, D., Bloxham, S., Bruna, D., & Bruna, C. (2020). Using principles of authentic assessment to redesign written examinations and tests. Innovations in Education and Teaching International, 57(1), 38–49. https://doi.org/10.1080/14703297.2018.1564882

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top