Weakly Supervised Camouflaged Object Detection Based on the SAM Model and Mask Guidance

2026-05-25 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence

AI summaryⓘ

The authors designed a new method called MGNet to find objects that blend into their backgrounds in photos, which is usually hard because the objects look very similar to their surroundings. Instead of using detailed labels that are difficult to make, they used weak labels like simple boxes and improved them using a smart tool called BoxSAM to create better guides for training. Their MGNet includes parts that help better predict edges and combine features to avoid missing objects. Tests show their approach works well compared to other leading methods.

Camouflaged Object DetectionWeak SupervisionPixel-level AnnotationsPseudo-labelsSegment Anything ModelMask DecoderFeature AggregationBounding-box Prompts

Authors

Xia Li, Xinran Liu, Lin Qi, Junyu Dong

Abstract

Camouflaged object detection (COD) from a single image is a challenging task due to the high similarity between objects and their surroundings. Existing fully supervised methods require labor-intensive pixel-level annotations, making weakly supervised methods a viable compromise that balances accuracy and annotation efficiency. However, weakly supervised methods often experience performance degradation due to the use of coarse annotations. In this paper, we introduce a new weakly supervised approach for camouflaged object detection to overcome these limitations. Specifically, we propose a novel network, MGNet, which tackles edge ambiguity and missed detections by utilizing initial masks generated by our custom-designed Cascaded Mask Decoder (CMD) to guide the segmentation process and enhance edge predictions. We introduce a Context Enhancement Module(CEM) to reduce the missing detection, and a Mask-guided Feature Aggregation Module (MFAM) for effective feature aggregation. For the weak supervision challenge, we propose BoxSAM, which leverages the Segment Anything Model (SAM) with bounding-box prompts to generate pseudo-labels. By employing a redundant processing strategy, high quality pixel-level pseudo-labels are provided for training MGNet. Extensive experiments demonstrate that our method delivers competitive performance against current state-of-the-art methods.

View PDFOpen arXiv