SMART: SMPLest-X Mesh Adaptation and RAFT Tracking for Soccer Pose Estimation

2026-05-29Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors developed a method to estimate 3D poses of soccer players from broadcast videos for the FIFA Skeletal Tracking Challenge 2026. They improved an existing model called SMPLest-X by fine-tuning it with special training techniques and used additional tools like optical flow tracking and temporal smoothing. Their approach significantly outperformed the baseline on validation and test datasets, achieving more accurate pose estimations in terms of error measurements. This work focuses on combining multiple techniques to better track player movements in 3D space from standard video footage.

3D pose estimationSMPLest-XViT-Hoptical flowtemporal smoothingmulti-task supervisionbroadcast video augmentationGlobal MPJPELocal MPJPE
Authors
Parthsarthi Rawat
Abstract
We present our approach to the FIFA Skeletal Tracking Challenge 2026, which requires estimating 3D world-space poses of soccer players from broadcast video. Our method finetunes SMPLest-X (ViT-H, 687 M parameters) via a stratified clip split, multi-task depth supervision, and broadcast augmentation, paired with a RAFT dense optical flow camera tracker, foot-plane anchoring, and two-pass temporal smoothing. Against the FIFA baseline score of 1.053 on the validation set, SMART achieves 0.647, a 38.6% improvement; on the held-out test set, SMART scores 0.593 (Global MPJPE: 0.324 m, Local MPJPE: 0.054 m).