IMPACTeen: Intentions, Manipulation, Persuasion, Annotations, and Consequences in Teen Communication Dataset

2026-06-15 • Computation and Language

Computation and LanguageArtificial Intelligence

AI summaryⓘ

The authors created IMPACTeen, a collection of over 1,000 text examples showing how social influence happens among teenagers in various settings like personal, media, and online. Each text is labeled with different social influence techniques from the viewpoints of teenagers, parents, psychologists, communication experts, and teachers. They used a combination of AI-generated text and careful human editing to make sure the examples feel real for youth. The dataset includes detailed annotations about influence and reactions, helping research on how social influence is detected and understood. It is available in Polish with an English version too.

social influenceannotationlanguage modelsLLM generationmultidimensional analysiscross-lingual modelingyouth contexttext datasetmedia influenceinterpersonal communication

Authors

Aleksander Szczęsny, Wiktoria Mieleszczenko-Kowszewicz, Maciej Markiewicz, Beata Bajcar, Tomasz Adamczyk, Jolanta Babiak, Grzegorz Chodak, Przemysław Kazienko

Abstract

IMPACTeen is a dataset of textual social influence scenarios spanning interpersonal, media-based, and digital settings in an adolescent context. It contains 1,021 texts, 5,100 individual annotation records, and gold labels for social influence techniques, with each text annotated from five distinct perspectives: teenagers, parents, psychologists, communication experts, and teachers. The resource was constructed through constrained LLM generation, followed by a two-step human editing and validation phase aimed at ensuring youth-context realism. A multi-dimensional annotation covered influence presence, techniques, intentions, consequences, resistance, reactions, and annotation confidence. The dataset supports research on social influence detection, annotator disagreement, cross-lingual modeling, and the training and evaluation of language models. The dataset was created in Polish and is accompanied by a corresponding English version.

View PDFOpen arXiv