KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing
2026-06-15 • Computation and Language
Computation and LanguageMachine Learning
AI summaryⓘ
The authors address the problem of erasing parts of a long text that a language model has already processed, which is hard because changes affect everything that comes after. They present KVEraser, a method that efficiently edits only the parts of the model’s memory related to the erased text, avoiding full reprocessing. They train KVEraser in two stages to make it work well on various tasks. Their experiments show KVEraser performs almost as well as redoing the whole text but much faster, especially on long documents with distracting or wrong facts.
KV cachecontext erasinglanguage modelspost-hoc editinglong-contextprefillfine-tuningrecomputationtransfer learningQA tasks
Authors
Mufei Li, Shikun Liu, Dongqi Fu, Haoyu Wang, Yinglong Xia, Hong Li, Hong Yan, Pan Li
Abstract
Post-hoc context erasing over the KV cache is challenging because a local edit has a global consequence: once a span has been processed, its influence propagates into the cached states of all subsequent tokens. This issue arises naturally in long-context LLM applications, where stale retrieved facts, incorrect tool observations, retracted user preferences, or harmful prompt injections may be identified only after prefill. Exact erasing must then recompute all tokens after the deleted span, making its computational cost depend on suffix length rather than erased-span length. We introduce KVEraser, a learned KV-cache editing method for efficient localized context erasing. Given a processed context and a span to remove, KVEraser replaces only the KV states of the erased interval with learned steering states while reusing the remaining cache unchanged. To learn a transferable erasing mechanism, we build a two-stage training pipeline: generic span-neighbor pre-training teaches the eraser to suppress the influence of the erased span, while task-specific fine-tuning adapts this capability to downstream scenarios. Experiments show that KVEraser nearly matches full recomputation in post-erasure performance on in-domain tasks across 1K--32K context lengths, while its latency increases by only 24% compared with a 17.6x increase for full recomputation. KVEraser also generalizes to unseen long-document QA tasks with harmful factual distractors, achieving the best performance among approximate baselines with a 3--4x speedup over full recomputation.