SimpliHuMoN: Simplifying Human Motion Prediction

2026-03-04Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionMachine Learning
AI summary

The authors developed a simple transformer-based model that can predict both human body poses and movement paths at the same time. Their model uses attention mechanisms to understand how different parts of the body relate to each other and how movements change over time. Unlike earlier models that focus on either pose or trajectory separately, this one works well for both without needing special changes. They tested their model on several popular datasets and found it outperforms previous methods across all tasks.

human motion predictiontransformerself-attentionpose predictiontrajectory forecastingHuman3.6MAMASSETH-UCY3DPWend-to-end model
Authors
Aadya Agrawal, Alexander Schwing
Abstract
Human motion prediction combines the tasks of trajectory forecasting and human pose prediction. For each of the two tasks, specialized models have been developed. Combining these models for holistic human motion prediction is non-trivial, and recent methods have struggled to compete on established benchmarks for individual tasks. To address this, we propose a simple yet effective transformer-based model for human motion prediction. The model employs a stack of self-attention modules to effectively capture both spatial dependencies within a pose and temporal relationships across a motion sequence. This simple, streamlined, end-to-end model is sufficiently versatile to handle pose-only, trajectory-only, and combined prediction tasks without task-specific modifications. We demonstrate that this approach achieves state-of-the-art results across all tasks through extensive experiments on a wide range of benchmark datasets, including Human3.6M, AMASS, ETH-UCY, and 3DPW.