We talk a lot about AI literacy on this blog. Frameworks, competency models, progression scales. And most of them share a quiet assumption: that AI literacy is about understanding AI, its mechanics, its risks, and its appropriate uses. Sidra and Mason (2026), in a new study published in the International Journal of Human-Computer Interaction, argue that understanding isn’t enough. The question that matters now is whether you can actually coordinate with AI in real time, during the work itself. They’ve built two validated scales to measure exactly that, and the findings are worth taking seriously.
Why Existing AI Literacy Scales Fall Short
Most AI literacy instruments were designed for a world where AI tools were static. You gave an input, got an output, and moved on. The scales that emerged from that era, frameworks like Ng et al. (2021), Wang et al. (2023), and Carolus et al. (2023), measure awareness, evaluation, ethics, and usage. These are important. But they treat AI as something to be understood and assessed, not something to be worked with as a dynamic partner across a sustained interaction.
Sidra and Mason argue that collaborative AI tools like ChatGPT and Claude have changed the relationship. These tools support iterative, goal-directed exchanges. The human guides the process, provides feedback, adjusts prompts, builds on outputs, and steers the AI toward better results across multiple rounds. That kind of work requires communication and coordination skills that no existing scale measures. I’ve covered Chee, Ahn and Lee’s (2025) AI literacy competency framework on this blog, and it’s one of the more comprehensive models available. But even that framework doesn’t account for the interactive, reciprocal dimension Sidra and Mason are describing.
The theoretical grounding is Distributed Cognition Theory: the idea that intelligence isn’t confined to a single brain but emerges from the interaction between people, tools, and environments. When you work with a collaborative AI system, the cognitive load is shared. The quality of the outcome depends on how well the human and the AI coordinate their contributions. That reframing, from individual knowledge to system-level performance, is what makes this paper conceptually interesting.

The Two Scales and What They Measure
Sidra and Mason developed and validated two new instruments. The Collaborative AI Literacy scale measures three factors: AI Evaluation (can you assess the AI’s strengths and limitations during use?), AI Usage (can you direct, prompt, and build on what the AI produces?), and AI Ethics (can you identify bias, evaluate transparency, and make informed decisions about delegation?). The items are practical and interaction-focused. “Build upon and complement the AI’s output” and “Direct the AI so that you obtain the output that you seek” are typical examples. These go well beyond knowing what AI is.
The Collaborative AI Metacognition scale also has three factors: Planning (do you think about how to divide the work between you and the AI before you start?), Monitoring (do you check your own biases and reactions during the collaboration?), and Evaluation (do you reflect on what went well and what didn’t after the task is done?). The emphasis is on regulating the joint human-AI system, not just managing your own thinking in isolation.
Both scales showed strong psychometric properties. The Collaborative AI Literacy scale had a Cronbach’s alpha of 0.92 and a Raykov’s Rho of 0.94. The metacognition scale came in at 0.88 and 0.87 respectively. Confirmatory factor analysis confirmed the three-factor structure for both, and both were shown to be related to but distinct from general AI literacy and general metacognition measures. The constructs are real and they’re measurable.
The Metacognition Finding That Connects Everything
The most striking result is about metacognition. Sidra and Mason tested whether their Collaborative AI Metacognition scale predicted self-reported benefits from AI use beyond what a general metacognition measure could predict. It did. When both measures were included in the model, Collaborative AI Metacognition remained a significant predictor (β = 0.517, p < 0.001) and general metacognition dropped to non-significance (β = 0.093, p = 0.397). People who plan, monitor, and reflect specifically during their AI collaboration get more out of the tools. General metacognitive ability alone doesn’t cut it.
This connects directly to what Fan et al. (2025) found in their study on metacognitive laziness. Students using ChatGPT skipped the evaluation and orientation steps that make learning stick. The essays got better. The thinking didn’t. Sidra and Mason’s metacognition scale is essentially measuring the opposite of that laziness: the deliberate, effortful regulation of your own cognition during AI-assisted work. The two papers read as mirror images.
I’d also connect this to Gerlich’s (2025) work on cognitive offloading. When users treat AI as a shortcut, they outsource thinking. When they treat it as a collaborator, they have to think harder, planning what to delegate, monitoring for errors, evaluating whether the output meets the goal. Sidra and Mason’s data suggests that the second approach produces measurably better outcomes. As the authors put it, “without metacognition and deliberate thinking, we are likely to follow unconscious biases and either rely too much on the AI (influenced by its conversational fluency and convincing hallucinations) or fail to optimize its potential” (p. 5102).
What Didn’t Work, and Why It Matters
The literacy scale told a different story. Collaborative AI Literacy did not show incremental predictive validity over general AI literacy when predicting self-reported benefits. When both were included in the model, neither remained significant. Sidra and Mason are candid about this. They suggest the problem may be methodological: self-reported benefits are subjective and vulnerable to cognitive distortion. People with lower AI literacy sometimes perceive AI as almost magical, overestimating what it does for them. That perception inflates their benefit scores and muddies the statistical relationship.
The authors recommend future research use performance-based outcome measures, task efficiency, decision-making accuracy, output quality, to test whether Collaborative AI Literacy predicts real-world results even if it doesn’t predict subjective satisfaction. I think that’s the right call. The construct makes theoretical sense, and the scale has strong psychometric properties. The issue is likely with what was measured as the outcome, not with the literacy construct itself.
Reading This in 2026: The Temporal Gap
One thing to note: the survey data was collected in December 2023, which was still very early in the generative AI adoption cycle. Most respondents had been using AI for about four months. ChatGPT was barely a year old. The collaborative capabilities of today’s tools, agentic AI, multi-step task execution, real-time document editing, code generation with iterative feedback, were either rudimentary or nonexistent at the time of data collection. Randazzo et al.’s (2025) taxonomy of human-AI collaboration models, cyborgs, centaurs, and self-automators, describes a far richer set of interaction patterns than what was available to Sidra and Mason’s participants in late 2023.
That doesn’t invalidate the findings. The scales are measuring the right constructs. But the field needs updated validation with users who have years of experience with today’s tools. The December 2023 snapshot may underestimate how much collaborative literacy and metacognition matter now.
What This Means for AI Literacy Education
The practical implication is clear and it reinforces a point I’ve made across this entire blog: knowing about AI is a starting point, not the destination. Educators and trainers need to build the interactive skills, the ability to communicate goals clearly to an AI system, to evaluate its contributions critically, and to regulate your own thinking throughout the process. An AI literacy curriculum that leaves out collaboration is only half a curriculum.
If a student can define what a large language model is but can’t steer one through a multi-step research task, can’t tell when the AI is fabricating, can’t adjust their approach mid-conversation, then we haven’t taught them what they actually need. Sidra and Mason give us validated instruments to measure these competencies. The next step is building curricula that develop them.
References
- Carolus, A., Koch, M. J., Straka, S., Latoschik, M. E., & Wienrich, C. (2023b). MAILS-Meta AI literacy scale:Development and testing of an AI literacy questionnaire based on well-founded competency models and psy-chological change-and meta-competencies. Computers in Human Behavior: Artificial Humans, 1(2), 100014.
- Chee, H., Ahn, S., & Lee, J. (2025). A competency framework for AI literacy: Variations by different learner groups and an implied learning pathway. British Journal of Educational Technology, 56, 2146-2182. https://doi.org/10.1111/bjet.13556
- Fan, Y., Tang, L., Le, H., Shen, K., Tan, S., Zhao, Y., Shen, Y., Li, X., & Gašević, D. (2025). Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. British Journal of Educational Technology, 56(2), 489–530. https://doi.org/10.1111/bjet.13544
- Gerlich, M. (2025). AI tools in society: Impacts on cognitive offloading and the future of critical thinking. Societies, 15(1), Article 6. https://doi.org/10.3390/soc15010006
- Ng, D. T. K., Leung, J. K. L., Chu, S. K. W., & Qiao, M. S. (2021). Conceptualizing AI literacy: An exploratoryreview. Computers and Education: Artificial Intelligence, 2, 100041.
- Randazzo, E., Lifshitz, A. H., Kellogg, K. C., Dell’Acqua, F., Mollick, E. R., Candelon, F., & Lakhani, K. R. (2025). Cyborgs, Centaurs and Self-Automators: The Three Modes of Human-GenAI Knowledge Work and Their Implications for Skilling and the Future of Expertise. . Harvard Business School.
- Sidra, S., & Mason, C. (2026). Generative AI in human-AI collaboration: Validation of the Collaborative AI Literacy and Collaborative AI Metacognition Scales for effective use. International Journal of Human-Computer Interaction, 42(7), 5084-5108. https://doi.org/10.1080/10447318.2025.2543997
- Wang, B., Rau, P.-L. P., & Yuan, T. (2023). Measuring user competence in using artificial intelligence: Validityand reliability of artificial intelligence literacy scale. Behaviour & Information Technology, 42(9), 1324–1337.https://doi.org/10.1080/0144929X.2022.20727
