A Multi Center Breast FNAC Whole-Slide Cytology Dataset for AI-Assisted Patch-Wise Classification Using C1 to C5 Reporting Categories

2026-06-29Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence
AI summary

The authors created a large dataset of breast fine needle aspiration cytology (FNAC) images from 321 patients across multiple medical centers in India. They collected and digitized 470 whole-slide images stained with two methods, then divided these into over 7,300 smaller image patches labeled with diagnostic categories called C1 to C5. The dataset includes original images, annotations, metadata, and tools to help researchers use it. It is freely available online and is about 950 GB in size.

breast FNACwhole-slide imagesPapanicolaou stainMay-Grünwald Giemsa stainimage patchesC1 to C5 labelsNanoZoomer scannercytology datasetdigital pathologyZenodo repository
Authors
Garima Jain, Abhijeet Patil, Surabhi Jain, Sanghamitra Pati, Amit Sethi, Sandeep Mathur, Pulkit Verma, Nishi Halduniya, Jatin Kashyap, Sharat Kumar, Simmi Kharb, Sunita Singh, Sucheta Devi Khuraijam, Sushma Khuraijam, Ratan Konjengbam, Arvind Kumar, Deepali Tirkey, Saurav Banerjee, Shivani Kalhan, Rakesh Kumar Gupta, Ranjana Solanki, Deepika Hemranjani, Shashank Nath Singh, Uma Handa, Manveen Kaur, B. G. Malathi, Yogender P., Niraj Kumari, Shruti Gupta, Indu R. Nair, Vidya C., Basumitra Das, Sunil Kumar Komanapalli, Ravindra Karle, Tanaya Kulkarni, Vandana Raphael, Biswajit Dey, Vaishali Gaikwad, Nilam Adhav
Abstract
We present a multi center breast fine needle aspiration cytology (FNAC) dataset designed for patch wise classification using C1 to C5 reporting labels. The prospective dataset includes 321 patients and 470 whole-slide images (WSIs) collected from participating tertiary medical centers in India between May 2023 and March 2026. Slides were stained using Papanicolaou (190 WSIs) or MayGrunwald Giemsa (280 WSIs), scanned on a Hamamatsu NanoZoomer S360 at 40X magnification and 0.25 microns per pixel, and stored directly in NDPI format. Across the 470 WSIs, 446 WSIs contain annotated patch regions, yielding 7,398 PNG image patches with expert-verified C1 to C5 labels. The release includes NDPI WSIs, WSI-level GeoJSON annotation files, extracted patch images, deidentified metadata, a data dictionary, a validation summary, a manifest linking WSIs to Zenodo records, and code for dataset inspection and reuse. The complete dataset is approximately 950 GB and is available through Zenodo.