Patricio Valdez

The problem

Agents start every conversation from zero. You can stuff context into a prompt, but that is short-term scratch, not memory. A real memory has to do two opposite things at once: capture everything as it happens, and slowly turn that noise into a small set of things worth keeping. Those two jobs fight each other, so I split them.

Two-speed memory

Evidence lands fast and cheap in an append-only event_log: raw, timestamped, never edited. Separately, a small set of distilled pages holds the compiled truth. The trick is that evidence does not become a page automatically. It has to survive a process I call the dream.

Evidence enters fast and raw; only the dream promotes it into distilled pages.

First it enters as evidence, then it is promoted only if it survives the dream. Inference is not truth.

The dream worker

The dream runs offline, like sleep. It does two things. It classifies each page into the human-data categories and decision layers it cares about, and it consolidates clusters of raw events into a single page: appending to that page's timeline, recompiling the current truth on top, drawing links to related pages, and marking the raw events as consumed. It is gated, idempotent, and file-locked, so running it twice is safe and never double-counts.

One core, many agents

The core is pure Python with no network: store, ingest, search, dream. Every frontend wraps that same core, so the brain is not tied to one agent. A CLI and an MCP server expose it, so any agent (Claude Code, Codex, Cursor, or Guardian Angel itself) reads and writes the same memory. Markdown is the source of truth; the search index is just a cache you can rebuild from it.

A network-free core, wrapped by many frontends that share one memory.

Retrieval

A memory is only as good as what it can pull back at the right moment. Retrieval here is hybrid, not a single vector lookup. A query fans out across three signals: FTS5 keyword search over the markdown, named-entity matching, and a one-hop graph walk over the typed links the dream drew. The three result sets get fused with reciprocal rank fusion (RRF), so a page that shows up strong in two signals beats one that only spikes in one.

None of that is tuned by vibes. There is a golden gate: a fixed set of queries with known-good answers, scored on MRR and Recall@8. The current numbers are MRR 0.896 and Recall@8 1.0. The rule I hold myself to: don't touch ranking without a number moving in the right direction.

The self-improving loop

Here is where it stops being a database and starts being a system that gets better on its own. The dream changes how memory is organized, what gets promoted, how pages link. Every one of those changes is a hypothesis: does this make retrieval better? So the loop closes through selfevals, my evals framework. It runs the dream's output against the golden gate, keeps the change if the metric improves, and rejects it if it regresses. Memory that edits itself, with a referee.

The dream proposes; selfevals decides. Only changes that move the metric survive.

Proactive recall

Memory isn't only something you pull, it's something that surfaces. brain_os classifies each page into decision layers, the kind of decision a piece of you informs, so the right slice of memory can come forward at the right moment, triggered by what's happening rather than by an explicit question. And it separates what you say from what you do, declared preference versus revealed behavior, because the patterns worth surfacing are the ones you can't see in yourself.

Where the ideas come from

Three patterns shaped it. Karpathy's llm-wiki is the shape of the memory itself: an LLM that maintains a structured markdown knowledge base instead of re-retrieving raw documents every time, with explicit ingest, query, and lint steps. That is exactly the two-speed split here, evidence in, distilled pages out.

Karpathy's autoresearch is the loop: an agent that tries something, measures it against a clear metric, keeps what improves, and discards what doesn't. The dream worker is that loop pointed at memory, it consolidates and is graded (brain dream-eval), because inference is not truth.

And GBrain, Garry Tan's memory system for his own agents, is where I borrowed the engineering: compiled truth plus timeline (rewrite the current truth on top, keep the evidence underneath), a self-wiring knowledge graph, and a fail-open, idempotent dream pass.

Where it fits

brain_os is the memory layer of Guardian Angel. It is also the wedge for something bigger: a model built on your own data is a perfect mirror, but it can only climb toward your local optimum. The interesting question is how an agent expands the space of options you can even see. The mirror comes first; it is the prior the rest needs.