Latent Space Reinforcement Learning for Inverse Material Estimation in Food Fracture Simulation

2026-06-15 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionGraphics

AI summaryⓘ

The authors worked on figuring out the properties of food materials, like an orange peel, by watching how they break or peel. Since measuring these properties directly is hard, they used computer simulations and trained a smart program to guess the material details based on how the peel behaves. They compared different ways to make these guesses and found a method that can quickly estimate properties for any orange peel without needing to be retrained each time. Their approach helps connect video observations of food with understanding its physical makeup.

material parametersfracture behaviorcontinuum damage mechanicsneural surrogateCovariance Matrix Adaptation Evolution Strategy (CMA-ES)Proximal Policy Optimization (PPO)latent representationgoal-conditioned policyinverse problem

Authors

Adrian Ramlal, Yuhao Chen, John S. Zelek

Abstract

Realistic visual simulation of food manipulation requires accurate material parameters, yet these are difficult to measure directly and vary across the heterogeneous regions of a single food item. We address the inverse problem of estimating material parameters from a target description of fracture behavior in a non-differentiable continuum damage mechanics simulator. Using orange peeling as a test case, we train a neural surrogate on 2,000 forward simulations and compare Covariance Matrix Adaptation Evolution Strategy (CMA-ES, a gradient-free evolutionary optimizer) with Proximal Policy Optimization (PPO, a reinforcement learning algorithm) across the original 9-dimensional parameter space and two learned 4-dimensional latent representations. Since different oranges have different material properties, a practical inverse system must handle arbitrary targets without retraining. We train a goal-conditioned PPO policy that learns a general inverse mapping: given any target description of peeling behavior, the policy produces a material parameter estimate in a single forward pass (8 surrogate evaluations, approximately 10ms). Operating in a normalizing flow latent space with a shared surrogate evaluator, the goal-conditioned policy achieves 0.642 actual recovery when validated through the simulator, outperforming the original parameter space by 23%. A warm-start extension that initializes CMA-ES refinement from the policy's output further improves recovery to 0.828 with 540 evaluations. These findings provide a practical framework for inverse food physics and lay groundwork for vision-driven material identification from video observations of food manipulation.

View PDFOpen arXiv