User as Code: Executable Memory for Personalized Agents

2026-06-15Artificial Intelligence

Artificial Intelligence
AI summary

The authors propose a new way for AI agents to remember user information by turning the memory into executable code instead of just storing facts as text. They call this approach User as Code (UaC), where a user's state is kept in Python objects and rules are written as Python functions that run automatically. This method helps the AI better handle complex questions about the user's history and spot important issues like medication conflicts. Their tests show UaC matches or beats existing memory systems, especially when the AI needs to combine or reason about many pieces of information at once.

personalized AI agentuser memoryknowledge graphretrieval-based memoryexecutable memoryUser as Code (UaC)typed Python objectsaggregate reasoningsafety alertslong-term conversation benchmarks
Authors
Bojie Li
Abstract
A personalized AI agent needs a user memory: a persistent model of who the user is, built across many conversations and consulted on each new one. Today this memory is almost always stored as unstructured text, a knowledge graph, or a flat store of facts, and consulted by retrieval -- fetching the entries most similar to the current request. Such "bag-of-facts" memory recalls individual facts well, but because storing a fact and acting on it are separate steps, it struggles to resolve contradictions, aggregate over many records, or enforce rules. We argue that user memory should instead be executable. We introduce User as Code (UaC), a paradigm in which an agent's model of a user is a living software project: typed Python objects hold the user's state and ordinary Python functions encode the rules that govern it, so representing and reasoning about the user happen in one medium an interpreter can run. The enabling mechanism is a two-phase pipeline: an append-only log that never discards a fact, periodically checkpointed into typed code. This changes what memory can do. On standard long-term conversation benchmarks, UaC matches both a full-context upper bound and the strongest prior memory systems on recall (78.8% on LOCOMO). Its advantage emerges where representation matters most. On aggregate questions over a user's history -- "how many international trips did I take last year?" -- retrieval-based memory collapses (6-43%) while UaC stays near-perfect (99%), because the answer is a one-line computation over typed state rather than a search over text. And because its rules execute deterministically whenever the state changes, UaC can surface unsolicited, safety-critical alerts -- such as a newly prescribed drug that conflicts with an allergy recorded months earlier -- a capability query-driven memory cannot provide.