Clickbait detection: quick inference with maximum impact

2026-04-09 • Computation and Language

Computation and Language

AI summaryⓘ

The authors developed a simple method to detect clickbait headlines by combining smart text features from OpenAI with some easy-to-measure style and content clues. They shrink the complex text data using a technique called PCA to make it faster to process, then test different machine learning models like XGBoost and graph-based classifiers. Even though their method is simpler and a bit less accurate, the graph models work well and run much faster. Their approach can reliably spot clickbait headlines under different settings, as shown by strong ROC-AUC scores.

clickbait detectionOpenAI embeddingsPCA (Principal Component Analysis)XGBoostGraphSAGEGCN (Graph Convolutional Network)heuristic featuresF1-scoreROC-AUCmachine learning

Authors

Soveatin Kuntur, Panggih Kusuma Ningrum, Anna Wróblewska, Maria Ganzha, Marcin Paprzycki

Abstract

We propose a lightweight hybrid approach to clickbait detection that combines OpenAI semantic embeddings with six compact heuristic features capturing stylistic and informational cues. To improve efficiency, embeddings are reduced using PCA and evaluated with XGBoost, GraphSAGE, and GCN classifiers. While the simplified feature design yields slightly lower F1-scores, graph-based models achieve competitive performance with substantially reduced inference time. High ROC--AUC values further indicate strong discrimination capability, supporting reliable detection of clickbait headlines under varying decision thresholds.

View PDFOpen arXiv