Needle in a Haystack -- One-Class Representation Learning for Detecting Rare Malignant Cells in Computational Cytology

2026-04-09 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionMachine Learning

AI summaryⓘ

The authors studied how to find very rare cancerous cells in medical images, which is hard because these cells look different and are surrounded by many normal cells. They tested methods that learn what normal cells look like and flag anything unusual, without needing detailed labels for every cell. Their tests showed that these one-class learning methods, especially DSVDD and DROC, can detect cancer cells better than some traditional methods when cancer cells are extremely rare. This approach works well even when fully labeling cancerous cells isn't possible. The authors suggest that learning normal patterns alone is a reliable way to spot rare malignant cells in complex images.

Computational cytologyMalignant cell detectionClass imbalanceOne-class representation learningMultiple instance learning (MIL)DSVDDDROCWeak supervisionBone marrow cytomorphology datasetOral cancer cytology

Authors

Swarnadip Chatterjee, Vladimir Basic, Arrigo Capitanio, Orcun Goksel, Joakim Lindblad

Abstract

In computational cytology, detecting malignancy on whole-slide images is difficult because malignant cells are morphologically diverse yet vanishingly rare amid a vast background of normal cells. Accurate detection of these extremely rare malignant cells remains challenging due to large class imbalance and limited annotations. Conventional weakly supervised approaches, such as multiple instance learning (MIL), often fail to generalize at the instance level, especially when the fraction of malignant cells (witness rate) is exceedingly low. In this study, we explore the use of one-class representation learning techniques for detecting malignant cells in low-witness-rate scenarios. These methods are trained exclusively on slide-negative patches, without requiring any instance-level supervision. Specifically, we evaluate two OCC approaches, DSVDD and DROC, and compare them with FS-SIL, WS-SIL, and the recent ItS2CLR method. The one-class methods learn compact representations of normality and detect deviations at test time. Experiments on a publicly available bone marrow cytomorphology dataset (TCIA) and an in-house oral cancer cytology dataset show that DSVDD achieves state-of-the-art performance in instance-level abnormality ranking, particularly in ultra-low witness-rate regimes ($\leq 1\%$) and, in some cases, even outperforming fully supervised learning, which is typically not a practical option in whole-slide cytology due to the infeasibility of exhaustive instance-level annotations. DROC is also competitive under extreme rarity, benefiting from distribution-augmented contrastive learning. These findings highlight one-class representation learning as a robust and interpretable superior choice to MIL for malignant cell detection under extreme rarity.

View PDFOpen arXiv