Where Concept Erasure Should Occur: Concept-Layer Alignment in Text-to-Video Diffusion Models
2026-05-25 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors studied how text-to-video diffusion models process different ideas unevenly at various layers within the model. They found that some layers separate the target concept from other information better than others, which makes it easier to remove that concept at those layers. To improve concept removal, they created CLEAR, a method that chooses layers based on how well they separate concepts from non-concept signals. Their experiments showed that this approach removes specific concepts more accurately without hurting the quality of the generated videos.
text-to-video diffusion modelsconcept erasurerepresentational bottlenecklayer-wise representationtopological alignmentconcept separabilitytransformer modelsoptimization frameworkgenerative qualityCLEAR method
Authors
Yiwei Xie, Ping Liu, Zheng Zhang
Abstract
Text-to-video diffusion transformers encode semantic information unevenly across model depth, which constrains effective concept erasure. We identify a representational bottleneck, termed concept-layer topological alignment, under which target concepts exhibit higher separability at certain representational depths. Outside these depths, concept and non-target signals remain strongly entangled, limiting the effectiveness of depth-specific erasure. This observation reframes concept erasure as the problem of identifying representational depths where concept-non-target separation naturally emerges. Motivated by this structural constraint, we introduce CLEAR, a separability-driven optimization framework for concept erasure that explicitly enforces concept-layer alignment. CLEAR operationalizes this principle by formulating layer selection as an optimization problem over concept-non-target separability, rather than relying on layer-agnostic or heuristic choices. To enable this, we introduce a separability-aware objective that favors layers exhibiting stronger concept-non-target separation. Experiments on large-scale text-to-video diffusion models demonstrate that enforcing concept--layer alignment leads to more precise concept suppression while preserving overall generative quality.