Train, Test, Re-evaluate: Schedule-Sensitive Evaluation of Generative Data for Hand Detection
2026-06-01 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionArtificial Intelligence
AI summaryⓘ
The authors studied how adding fake images of hands with accessories like gloves can help improve hand detection systems, especially for safety at work. They used a technique that edits real photos to add these accessories and then trained a detector on both real and synthetic images. Their experiments showed that training in multiple stages with this mixed data helps the detector better recognize hands with gloves, which are usually missing in standard datasets. This means edited synthetic data can make hand detection safer and more reliable in real-world conditions.
synthetic datahand detectiongenerative inpaintingYOLOv8mean average precision (mAP)distribution shiftoccupational safetyaccessory augmentationfine-tuningobject detection
Authors
Atmika Bhardwaj, Silvia Vock, Nico Steckhan
Abstract
Generated (or synthetic) image data is increasingly used to augment or replace real training datasets when target imagery is scarce, expensive, or biased. For hand detection, particularly in occupational safety settings, public datasets mostly contain bare hands. This under-represents the variation in hand appearance introduced by gloves, tattoos, jewelry, and other personal protective equipment, creating a distribution shift that safety-critical applications encounter at deployment. We test whether generative inpainting, editing only the hand region of a real photograph to introduce accessories, can close this shift gap. On a paired dataset of real images and their synthetic counterparts, we train YOLOv8n hand detectors under six training-and-scheduling regimes (Experiments A-F, three random seeds each), evaluate every detector on a real test set and on a real-gloves-only test split, and report the mean average precision (mAP) at two overlap thresholds (mAP@0.5 and mAP@0.5:0.95) along with paired statistical tests. A two-stage experiment: train on real U synthetic data, then fine-tune the resulting weights on real-only at a lower learning rate, increases mAP@0.5 compared to the real-only baseline model on the standard real test set, and improves the real-gloves out-of-distribution gap. Another three-stage experiment preserves box-tightness best, reaching the highest mAP@0.5:0.95 of any other experiment in the study. The synthetic-data utility for safety-critical hand detection is determined by the training procedure, and simple multi-stage experiments extract substantial real-deployment benefit from inpainted accessory data.