SAM3-Assisted Training of Lightweight YOLO Models for Precision Pig Farming

2026-05-25Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors developed a method that uses a big, smart image model called SAM 3 to automatically label pictures of pigs, so they don’t need humans to do it. They then train a smaller, faster pig detection model called YOLOv8 using these automatic labels. Their approach works well, especially when pigs aren’t blocking each other, and runs much faster on simple devices. This means their system can help farmers monitor animals in real-time without expensive hardware or manual work.

Foundation ModelsKnowledge DistillationSegment Anything Model (SAM)YOLOv8Object DetectionPrecision Livestock FarmingZero-Shot LearningPseudo-LabelingMean Average Precision (mAP)Edge Deployment
Authors
Marcos Vinicius Mendes Faria, Thiago Borges Pereira, Isabella C. F. S. Condotta, Thiago Meireles Paixão, Francisco de Assis Boldt
Abstract
Deep learning-based object detection has revolutionized Precision Livestock Farming (PLF), yet a critical barrier remains: high-performance Foundation Models (such as SAM 3) are too computationally intensive for edge deployment, while lightweight models (like YOLO) require prohibitive manual annotation efforts. This work proposes a fully automated knowledge distillation pipeline that leverages the Segment Anything Model 3 (SAM 3) to generate zero-shot pseudo-labels for training efficient YOLOv8 detectors. By treating SAM 3 as an offline auto-annotator, we eliminate the manual labeling bottleneck, producing models capable of real-time inference on resource-constrained hardware. We systematically evaluate this approach on the PigLife dataset, comparing SAM 3-supervised models against human-annotated baselines. Results demonstrate that a SAM 3-trained YOLOv8m achieves a mean Average Precision (mAP) of 79.4% without human intervention, while reducing inference latency by approximately 200$\times$ compared to the teacher model. Furthermore, stratified analysis reveals that in low-occlusion scenarios, the automated pipeline achieves detection rates comparable to human benchmarks ($AP_{50} > 99\%$). These findings indicate that foundation models can serve as effective, zero-annotation-cost supervisors, enabling scalable edge computing solutions for smart agriculture.