Versatile Editing of Video Content, Actions, and Dynamics without Training

2026-03-18Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors introduce DynaEdit, a new way to edit videos without needing extra training. Unlike previous methods that either can't change how things move or need lots of special data, DynaEdit works with existing video models and can handle complex changes like adding new objects or changing actions. They found and fixed problems like shaky edits and misalignment in earlier attempts. Their experiments show that DynaEdit performs very well in making detailed and interactive video edits based on text instructions.

Controlled video generationText-to-video modelsTraining-free video editingInversion-free approachDynamic video editingVideo flow modelsAction modificationMotion editingVideo object interactionModel-agnostic methods
Authors
Vladimir Kulikov, Roni Paiss, Andrey Voynov, Inbar Mosseri, Tali Dekel, Tomer Michaeli
Abstract
Controlled video generation has seen drastic improvements in recent years. However, editing actions and dynamic events, or inserting contents that should affect the behaviors of other objects in real-world videos, remains a major challenge. Existing trained models struggle with complex edits, likely due to the difficulty of collecting relevant training data. Similarly, existing training-free methods are inherently restricted to structure- and motion-preserving edits and do not support modification of motion or interactions. Here, we introduce DynaEdit, a training-free editing method that unlocks versatile video editing capabilities with pretrained text-to-video flow models. Our method relies on the recently introduced inversion-free approach, which does not intervene in the model internals, and is thus model-agnostic. We show that naively attempting to adapt this approach to general unconstrained editing results in severe low-frequency misalignment and high-frequency jitter. We explain the sources for these phenomena and introduce novel mechanisms for overcoming them. Through extensive experiments, we show that DynaEdit achieves state-of-the-art results on complex text-based video editing tasks, including modifying actions, inserting objects that interact with the scene, and introducing global effects.