Fursee: Hybrid YOLO-DINOv3 Framework for Fursuit Identity Retrieval and Clustering
2026-06-22 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors created a special dataset of pictures from furry conventions to help identify and group fursuit characters automatically, since doing it by hand is hard. They designed a three-step method called Fursee: first, it finds and cuts out images of fursuit heads; second, it improves how these images are compared to tell different characters apart; third, it groups similar images without needing manual tuning. Their method works better than popular general AI models for this specific task. This helps with organizing large collections of fursuit photos efficiently.
fursuitYOLOArcFaceDINOv3 embeddingsDBSCANclusteringidentity retrievalsilhouette coefficientmultimodal modelsbenchmark dataset
Authors
Jundi Wu
Abstract
Global furry conventions produce massive fursuit photographs, while manual sorting brings heavy labor costs and calls for automatic identity retrieval and clustering solutions. General multimodal models lack dedicated optimization for complex fursuit scenes, and no public benchmark dataset exists for this task. To fill this gap, we build a specialized fursuit image dataset and present a three-stage hybrid pipeline Fursee for fursuit identity retrieval and clustering. First, YOLO detects and crops high-resolution fursuit head patches to improve localization of small and overlapping targets. Second, ArcFace optimizes DINOv3 embeddings to enlarge angular separation between different identities on the feature hypersphere. Third, DBSCAN performs unsupervised clustering, with silhouette-coefficient-driven search automatically selecting optimal hyperparameters rather than fixed manual radius. Retrieval and clustering experiments verify that our pipeline outperforms mainstream multimodal models including GPT5.5, Claude Opus 4.8 and Qwen3.7-Plus on all evaluation metrics, achieving competitive performance for fursuit head retrieval and grouping.