SceneCraft: Interactive System for Image Editing via Scene Graph
2026-06-15 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors created SceneCraft, a tool that helps people edit images by letting them change a simple map of what's in the picture instead of writing tricky text commands. This map, called a scene graph, shows objects and how they relate to each other visually. When users adjust the graph, SceneCraft turns those changes into clear instructions that multiple AI models use to edit the image. Tested on different pictures, the system made it easier for users to control edits and produced better results without needing precise text prompts.
generative AIimage editingnatural language promptsscene graphinteractive frameworkspatial relationsprompt engineeringcontext-aware editingmulti-model generation
Authors
Duc-Manh Phan, Ngoc-Dai Tran, Duy-Khang Do, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le
Abstract
Recent advances in generative AI have enabled natural language-driven image editing, yet existing systems often fail in complex scenes with multiple interacting objects because they rely heavily on users crafting precise text prompts. To address the absence of structured control, we propose SceneCraft, a novel interactive framework that bridges user intent and model execution by representing images as editable scene graphs. Instead of guessing text prompts through trial and error, users interact directly with a visual graph to perform complex spatial and relational operations. These graph modifications are automatically translated into precise, context-aware editing prompts, effectively eliminating linguistic ambiguity. To ensure robust and diverse results, structured prompts are dispatched to multiple state-of-the-art generative models. Evaluations across diverse editing scenarios show that SceneCraft provides a more intuitive control mechanism, significantly reducing the cognitive burden of manual prompt engineering while generating outputs that users consistently rate as higher in quality and fidelity.