Why Are We Lonely? Leveraging LLMs to Measure and Understand Loneliness in Caregivers and Non-caregivers
2026-04-09 • Computation and Language
Computation and Language
AI summaryⓘ
The authors used advanced language models like GPT-4o and GPT-5 to collect and study Reddit posts about loneliness among caregivers and non-caregivers. They created special frameworks, developed with expert help, to evaluate feelings of loneliness and to categorize what causes it. Their methods showed good accuracy and revealed that caregivers often feel lonely because of their caregiving duties and a sense of being unrecognized or abandoned. This study shows that social media can be a useful place to understand different loneliness experiences in these groups using AI tools.
Large Language ModelsLoneliness Evaluation FrameworkCaregiverNon-caregiverSocial Media AnalysisReddit CorpusGPT-4oGPT-5F1 ScoreDemographic Extraction
Authors
Michelle Damin Kim, Ellie S. Paek, Yufen Lin, Emily Mroz, Jane Chung, Jinho D. Choi
Abstract
This paper presents an LLM-driven approach for constructing diverse social media datasets to measure and compare loneliness in the caregiver and non-caregiver populations. We introduce an expert-developed loneliness evaluation framework and an expert-informed typology for categorizing causes of loneliness for analyzing social media text. Using a human-validated data processing pipeline, we apply GPT-4o, GPT-5-nano, and GPT-5 to build a high-quality Reddit corpus and analyze loneliness across both populations. The loneliness evaluation framework achieved average accuracies of 76.09% and 79.78% for caregivers and non-caregivers, respectively. The cause categorization framework achieved micro-aggregate F1 scores of 0.825 and 0.80 for caregivers and non-caregivers, respectively. Across populations, we observe substantial differences in the distribution of types of causes of loneliness. Caregivers' loneliness were predominantly linked to caregiving roles, identity recognition, and feelings of abandonment, indicating distinct loneliness experiences between the two groups. Demographic extraction further demonstrates the viability of Reddit for building a diverse caregiver loneliness dataset. Overall, this work establishes an LLM-based pipeline for creating high quality social media datasets for studying loneliness and demonstrates its effectiveness in analyzing population-level differences in the manifestation of loneliness.