JODA: Composable Joint Dynamics for Articulated Objects

2026-05-11Robotics

RoboticsComputer Vision and Pattern Recognition
AI summary

The authors present JODA, a method to better capture the detailed physical behaviors of joints in simulated or robotic objects, like friction and soft movements. Instead of ignoring these effects or using simple models, their approach represents joint forces with a special mathematical function that is easy to understand and adjust. They use vision and language clues to guess joint behaviors, which can then be improved through direct tweaking or optimization. This makes it easier to create realistic and controllable joint movements for simulations or robots.

articulated objectsjoint dynamicsfrictiondampingpiecewise cubic interpolationdifferentiable simulationvision-language modelsmultimodal inputsgradient-based optimizationsimulation
Authors
Tianhong Gao, Cheng Yu, Yinghao Xu, Mengyu Chu
Abstract
Articulated objects used in simulation and embodied AI are typically specified by geometry and kinematic structure, but lack the fine-grained dynamical effects that govern realistic mechanical behavior, such as frictional holding, detents, soft closing, and snap latching. Existing approaches either ignore the detailed structure of dynamics entirely, or use simple models with limited expressiveness. We introduce JODA, a framework for generating joint-level dynamics as a structured three-channel field over the joint degree of freedom, capturing conservative forces, dry friction, and damping. Instantiated using shape-constrained piecewise cubic interpolation (PCHIP), this formulation defines a compact and expressive function space that is both interpretable and compatible with differentiable simulation. Building on this representation, we develop methods for inferring and refining joint dynamics from multimodal inputs. Given visual observations and joint context, a vision-language model proposes structured dynamical primitives, which are composed into a unified dynamics field. The resulting representation supports both direct manipulation and gradient-based refinement. We demonstrate that JODA enables plausible and controllable modeling of diverse joint behaviors, providing a unified interface for inference, editing, and optimization. Code and example assets with their generated profiles will be released upon publication.