Stance Detection in Prediction Markets: Addressing Imbalanced Trader Commentary via Counterfactual Augmentation and Market Context

2026-05-27Computation and Language

Computation and Language
AI summary

The authors studied how to detect people's opinions (stances) in short comments made by traders on prediction markets like Polymarket. These comments are hard to analyze because they are very brief, use special trader language, and mostly agree with the market outcome. They used a language model called RoBERTa to classify comments into classes and tested adding market context and synthetic data to improve performance. They found that including market context helped the most, that generating synthetic examples helped only sometimes, and using too many synthetic examples actually hurt results. Their analysis also explained why these effects happened.

Prediction marketsStance detectionRoBERTaClass imbalanceSynthetic data augmentationCounterfactual generationMarket contextF1 scoreRecallInterpretability
Authors
Thomas Mbrice
Abstract
Prediction markets such as Polymarket aggregate crowd beliefs into real-time probability estimates, and the comments traders post beneath each market contain rich directional stance signals that prices alone cannot capture. This work introduces the first stance detection study applied to prediction market commentary, a domain characterized by extreme brevity, trader- specific vernacular, and severe class imbalance (only 8.7% of comments oppose the market outcome). RoBERTa-base is fine-tuned across a 4 x 3 ablation: four input configurations ({2- class, 3-class} x {with/without market context}) and three augmentation conditions (baseline, 50% synthetic, 100% synthetic). Synthetic minority-class samples are generated via LLM-driven Pro -> Anti counterfactual flips using the Anthropic API. Results show that (1) market context is the single most impactful factor, raising 3-class Anti recall from 0.10 to 0.45; (2) counterfactual augmentation is conditionally effective, improving Anti F1 in weak configurations (0.10 -> 0.24) while degrading strong ones (2-class-ctx macro F1: 0.68 -> 0.50 at full dose); and (3) 50% augmentation is the optimal dose, with 100% consistently hurting performance. Attention-based interpretability analysis provides mechanistic support for all three findings.