Identifying High-Confidence Social Biases in LLMs for Trustworthy Conversational Tutoring Agents
2026-06-01 • Computation and Language
Computation and LanguageArtificial Intelligence
AI summaryⓘ
The authors studied how large language models (LLMs), used as conversational tutors, can show strong confidence in biased or unfair statements without realizing they are biased. They created a new way to test these models by making up tutoring conversations that include biased turns and then checked if models could detect these biases. The authors found that LLMs struggle to spot bias in real tutoring chats and often give confident but wrong feedback, which could negatively affect students. They discuss the risks and suggest ways to reduce these problems in future work.
Large Language ModelsConversational TutorsSocial BiasBias DetectionConfidence CalibrationNatural Language ProcessingAI EthicsEducational TechnologyHuman-AI Interaction
Authors
Aitor Arronte Alvarez, Naiyi Xie Fincham
Abstract
Conversational tutoring agents have been shown to improve learning engagement and student outcomes, and large language models (LLMs) are increasingly used in these systems to provide scalable, personalized feedback. However, LLMs may perpetuate or amplify stereotypical social biases, posing particular risks in educational settings. In this study, we evaluate LLMs in conversational tutoring scenarios to identify high-confidence social biases, instances where models are unable to identify biased judgments in tutoring conversations while maintaining strong confidence in their assessments, potentially affecting their reasoning and the feedback they provide to learners. We present a new dataset generation method that enables bias evaluation under naturalistic instructional conditions by regenerating student-AI tutor interactions and introducing turns with controlled bias derived from a benchmark dataset. Using this data, we assess multiple LLMs' ability to detect stereotypical biases and analyze the confidence and reasoning underlying their responses through computational and human evaluations. We find that bias detection is substantially more challenging in conversational tutoring contexts than in benchmark-based evaluations, and that state-of-the-art LLMs are overconfident in their incorrect assessments of stereotypical bias statements. Moreover, model confidence strongly influences reasoning and feedback, highlighting the risks of overconfident, biased behavior in LLM-based tutoring agents. We conclude by discussing implications, mitigation considerations, and directions for future research.