Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents

2026-05-11 • Cryptography and Security

Cryptography and SecurityArtificial IntelligenceComputation and LanguageInformation RetrievalMachine Learning

AI summaryⓘ

The authors developed Nautilus Compass, a tool that helps coding assistants like Claude Code remember user instructions better during long sessions. Unlike other methods, it works without needing access to the model's internal data by analyzing only the text prompts using special embeddings. Their tool detects when the assistant 'drifts' from expected behavior with good accuracy and is cheaper to run than some alternatives. They made the code and test data publicly available for others to use and improve.

LLMpersona driftembeddingcosine similarityprompt engineeringagent memoryClaude Codeblack-box methodROC AUCaudit log

Authors

Chunxiao Wang

Abstract

Production LLM coding agents drift over long sessions: they forget user-specified constraints, slip into mistakes the user already flagged, and confabulate prior agreements. White-box approaches such as persona vectors require model weights and so cannot be applied to closed APIs (Claude, GPT-4) that most users actually interact with. We present Nautilus Compass, a black-box persona drift detector and agent memory layer for production coding agents. The method operates entirely at the prompt-text layer: cosine similarity between user prompts and behavioral anchor texts, aggregated by a weighted top-k mean using BGE-m3 embeddings. Compass is, to our knowledge, the only public agent memory layer (among Mem0, Letta, Cognee, Zep, MemOS, smrti verified May 2026) that does not call an LLM at index time to extract facts or build a graph; raw conversation text is embedded directly. The system ships as a Claude Code plugin, an MCP 2024-11-05 A2A server (Cursor, Cline, Hermes), a CLI, and a REST API on one daemon, with a Merkle-chained audit log for tamper-evident anchor updates. On a held-out test set built from real Claude Code session traces and labeled by an independent LLM judge, Compass reaches ROC AUC 0.83 for drift detection. The embedded retrieval pipeline scores 56.6% on LongMemEval-S v0.8 and 44.4% on EverMemBench-Dynamic (n=500), topping the four published EverMemBench Table 4 baselines. LongMemEval-S 56.6% is ~30 points below recent white-box leaders (90+%); we treat that as the architectural ceiling of the no-extraction design. End-to-end reproduction cost is $3.50 (~14x cheaper than GPT-4o-judged stacks). A paired cross-vendor behavior A/B accompanies these numbers as preliminary system-level evidence. Code, anchors, frozen test data, and audit-log tooling are MIT-licensed at github.com/chunxiaoxx/nautilus-compass.

View PDFOpen arXiv