DECSELFMASK: Leveraging Unlabeled Text via Self-Relevance-Guided Masking for Decoder-Only Classification

2026-06-08Computation and Language

Computation and Language
AI summary

The authors address the problem of having too few labeled examples for medical text classification by developing DecSelfMask, a method that uses unlabeled data to improve model performance. Their approach identifies important parts of the text, masks them, and trains the model to predict the missing pieces, helping it learn useful language patterns. They tested this on a large set of clinical notes and found that DecSelfMask improved results more than other common methods. This shows their technique helps models learn better from unlabeled medical texts.

classification tasksself-supervised learningmasking strategyrelevance attributionnext-token predictionfine-tuningclinical notesdecoder-only modelsmacro F1 score
Authors
Pietro Ferrazzi, Matteo Merler, Giovanni Bonetta, Alberto Lavelli, Bernardo Magnini
Abstract
Classification tasks require annotated data, which can often be expensive, time-consuming, or even unfeasible to collect. This is the case of the medical domain, where large datasets often have few annotated examples. To address this, we propose DecSelfMask (Decoder Self-learning by Masking), an approach to enhance decoder-only performance on classification tasks. We build on common self-learning approaches by leveraging a model to create training examples from unlabeled data to propose a novel relevance-guided masking strategy. We use relevance attribution methods to determine what portions of unannotated texts are relevant for a task. We then create self-supervised training examples by masking out those portions, training the model to reconstruct them via next-token-prediction. We hypothesize that those examples convey knowledge about the structure and semantics of unannotated data that can be useful for downstream performance. We test our approach on 136 tasks from a collection of 1.9M clinical notes from an Italian hospital. We quantify DecSelfMask's impact on downstream tasks on 5 models of different scales and families, including a probing analysis. Experiments show consistent gains, outperforming standard supervised fine-tuning approaches (+19.9 points in Macro F1), synthetic label generation (+12.5), and continual pretraining (+6.3), as well as common baselines.