MoPO: Incorporating Motion Prior for Occluded Human Mesh Recovery

2026-05-11 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence

AI summaryⓘ

The authors address problems in recovering 3D human body shapes from images when parts of the body are hidden (occluded), which often leads to mistakes and shaky poses. They propose MoPO, a method that predicts the positions of hidden joints by using past body movements, rather than relying only on image data. Their system detects which joints are hidden, fills in their positions based on motion history, and then combines this information with visible image features to better estimate the body shape and pose. Tests show that MoPO improves accuracy and smoothness in tracking human poses, especially when parts are not visible.

Human mesh recoveryOcclusionMotion priorPose estimationSpatial-temporal occlusion detectionMotion predictionInverse kinematicsTemporal consistencyJoint visibilityHuman shape estimation

Authors

Tao Tang, Hong Liu, Xinshun Wang, Wanruo Zhang

Abstract

Although recent studies have made remarkable progress in human mesh recovery, they still exhibit limited robustness to occlusions and often produce inaccurate poses and severe motion jitter due to the insufficient spatial features for occluded body parts. Inspired by the rapid advancements in human motion prediction, we discover that compared to occluded image features, pose sequence inherently contains reliable motion prior for estimating occluded body parts. In this paper, we incorporate Motion Prior for Occluded human mesh recovery, called MoPO. Our MoPO mainly consists of two components: 1) The motion de-occlusion module, where we propose a spatial-temporal occlusion detector to detect joint visibility, and then we propose a lightweight motion predictor to complete the occluded body parts by predicting the most plausible joint positions based on history poses. 2) The motion-aware fusion and refinement module, which fuses the completed joint sequence with image features to estimate human shape and initial human pose. Moreover, the completed joint sequence is further used to refine the final human pose through inverse kinematics, which provides the occlusion-free motion prior for regressing human poses. Extensive experiments demonstrate that MoPO achieves state-of-the-art performance on both occlusion-specific and standard benchmarks, significantly enhancing the accuracy and temporal consistency of occluded human mesh recovery. Our code and demo can be found in the supplementary material.

View PDFOpen arXiv