Analytic Concept-Centric Memory for Agentic Embodied Manipulation
2026-06-29 • Robotics
Robotics
AI summaryⓘ
The authors created a new way for robots to remember and use information when manipulating objects over long periods. Instead of just storing raw data or confusing codes, their method organizes memories around clear concepts like object parts, positions, and skills. This helps the robot quickly find the right info to understand and interact with objects correctly. Their tests show this approach helps robots finish tasks better and learn from past actions more effectively than older methods.
Embodied manipulationAgent memorySemantic partsAffordancesParametric templatesSkill memoryState transitionsObject re-identificationCoarse-to-fine retrieval
Authors
Mingyang Sun, Xiujian Liang, Jiude Wei, Qichen He, Donglin Wang, Cewu Lu, Jianhua Sun
Abstract
Long-horizon embodied manipulation requires agents to remember persistent objects, track changing scene states, and reuse prior interaction knowledge. However, existing agent memories are often stored as unstructured histories or embedding-based records, making it difficult to retrieve manipulation-relevant object parts, physical states, action effects, and executable skills. We propose an analytic concept-centric memory framework for agentic embodied manipulation. Our memory organizes experience around structured analytic concepts, where objects are represented by semantic parts, parametric templates, grounded poses, affordances, and manipulation states. It further connects object and scene memories with transition memory for action-induced state changes and skill memory for template-grounded and policy-grounded execution. At runtime, the agent performs structured coarse-to-fine retrieval to identify relevant objects, states, transitions, and skills, supporting state-consistent reasoning and skill reuse. Experiments on memory-dependent manipulation, articulated-object generalization, real-world memory evaluation, and ablations show that our approach improves task completion, retrieval accuracy, object re-identification, and cross-object skill generalization over unstructured and embedding-based memory baselines.