I’ve been writing for a while about how design choices around AI shape what students think AI is, and how that shapes the way they use it. Most of these conversations focus on what AI says. A new paper from Cohn and colleagues (2024) at CHI focuses on something different: how AI says it. The findings have direct implications for educators thinking about voice-enabled AI in classrooms, and for anyone who has watched a student talk to ChatGPT’s voice mode and wondered what’s happening in their head.
The Study
The authors ran a large pre-registered experiment with 2,165 US adults, randomly assigned to one of four conditions in a 2×2 factorial design crossing modality (text only vs. speech+text using TTS) and grammatical person (“I” vs. “the system”). Participants interacted with a pseudo-LLM that gave identical responses across all four conditions.
The task was a 20-trial information-seeking exercise across five domains: health, careers, medications, travel, and cooking. After each trial, participants rated perceived accuracy, perceived risk, and whether they would validate the information externally. After all trials, they rated overall anthropomorphism and trustworthiness.

What the Voice Did
The voice manipulation had the strongest effect. A TTS voice (the speech+text condition) increased both anthropomorphism scores and perceived accuracy of the information the LLM provided. The voice did this even though the actual content of responses was identical across conditions.
The authors are clear about what this means: “Relative to a text-only interface, participants believed the information a system gave was more accurate when they also heard the system talk. This effect is robust even in the absence of explicit human-like features, such as an image” (p. 8).
A voice alone, with no avatar, no name, no other human-like cues, was enough to push accuracy ratings up. That’s a striking finding given how cheap a TTS voice is to add to any interface.
The Pronoun Twist
The grammatical person manipulation showed weaker effects. The “I” framing did not shift overall anthropomorphism scores. It did affect ratings in specific contexts. Medication questions answered with “I” were rated as more accurate and less risky. Career questions answered with “I” were rated as less accurate and riskier.
The same word produced opposite effects depending on the topic. “I” might cue domain expertise in some contexts and personal subjectivity in others.
Connecting to Recent Work
This paper extends a finding that’s been showing up across the AI-trust literature. Colombatto, Birch, and Fleming (2025) recently showed that user trust in LLM advice depends on which mental states people attribute to the system: intelligence attributions boost trust, experience attributions reduce it. Cohn and colleagues add a behavioral lever to that picture. Voice doesn’t just shift abstract attributions. It shifts trial-by-trial accuracy judgments.
Shanahan (2024) argued in his philosophical analysis of LLM language that loose use of words like “thinks” and “knows” obfuscates mechanism. This paper provides empirical weight for a related claim about sound. A TTS voice carries no propositional content beyond what the text already says, yet it shifts how that text is received. The medium shapes the message, even when the message itself is unchanged.
AI Voice in the Classroom
The educational implication is direct. Voice-enabled AI is now mainstream. ChatGPT voice mode, Gemini Live, and Claude’s voice features are in widespread use. If a TTS voice alone increases perceived accuracy of LLM output by a measurable amount, then teachers introducing voice-based AI need to be aware that students will show elevated trust in voiced AI output compared to text-only output. That’s a vulnerability worth designing around.
In my elementary AI Use Agreement, I included a small section called “AI Is Not a Person” specifically to counter the kind of anthropomorphism that this study documents empirically. The fix isn’t to ban voice features. It’s to teach students to keep their guard up when the AI sounds confident. A confident voice on an AI that just hallucinated a citation is exactly the situation that makes the hallucination dangerous.
Limitations
The paper has the limitations you’d expect. The pseudo-LLM gave controlled, identical responses. Real LLMs don’t. The study used one TTS voice (US-English female studio). Voice-trust effects likely vary by voice type, accent, gender, and how natural the voice sounds. Two years on from the study, voice models have improved substantially, which could amplify the effect.
The authors are explicit about the design implication. They caution that “Our findings suggest that people believe information is more accurate and less risky when presented with anthropomorphic cues, which could lead to downstream harms if the system produces non-accurate data or stereotypes” (p. 9).
References
- Cohn, M., Pushkarna, M., Olanubi, G. O., Moran, J. M., Padgett, D., Mengesha, Z., & Heldreth, C. (2024). Believing anthropomorphism: Examining the role of anthropomorphic cues on trust in large language models. arXiv preprint arXiv:2405.06079. https://arxiv.org/abs/2405.06079
- Colombatto, C., Birch, J., & Fleming, S. M. (2025). The influence of mental state attributions on trust in large language models. Communications Psychology, 3(1), 84. https://www.nature.com/articles/s44271-025-00262-1
- Shanahan, M. (2024). Still” Talking About Large Language Models”: Some Clarifications. arXiv preprint arXiv:2412.10291.
