Controllable Texture Tiling with Transformed RoPE-Enhanced Diffusion Models

2026-06-22Graphics

GraphicsComputer Vision and Pattern Recognition
AI summary

The authors present a new method to realistically add repeating textures to images, letting users control pattern size, angle, and repetition frequency. Their approach uses advanced AI techniques called Diffusion Transformers to better preserve the original texture's details and the scene's lighting and shapes. They introduce special tools to manipulate texture placement precisely without messing up the image and to keep the texture's structure intact. Tests show their method works better than existing ones for making textures look right and fit properly.

texture tilingDiffusion Transformerspositional embeddingsaffine transformationsimage inpaintingsemantic image encodersattention masktexture fidelityspatial manipulationmaterial transfer
Authors
Junrong Huang, Zhiyuan Zhang, Rui Tang, Hongbo Fu, Jnig Liao
Abstract
Realistic integration of user-specified textures into scene images is a fundamental task in computer graphics and image editing. While existing material transfer and reference-guided inpainting methods can edit surface appearances, they often fail to address the specific requirements of texture tiling. This task necessitates precisely repeating a reference pattern according to user-defined parameters such as frequency, orientation, and scale. Furthermore, current generative approaches often struggle to maintain the structural fidelity of the reference texture, limited by either destructive pixel-level resampling or the lack of fine-grained spatial information in semantic image encoders, and they frequently fail to preserve the coherent lighting and geometry of the original scene. In this paper, we propose a novel framework for controllable and high-fidelity texture tiling based on Diffusion Transformers. Our approach introduces two key technical innovations to decouple spatial manipulation from content generation. First, we propose a Coordinate-Transformed Rotary Embedding mechanism. By applying 2D affine transformations directly to the relative positional embeddings between the target latent and the image condition, we achieve precise control over tiling patterns without explicit pixel warping, thereby utilizing the full information of the reference condition without degradation. Second, a Disjoint Attention Mask is employed to shield reference features from semantic leakage. This preserves structural integrity while seamlessly blending the synthesized texture with the scene's original lighting and geometry. Extensive experiments demonstrate that our method outperforms state-of-the-art baselines in both control accuracy and texture fidelity.