Building Semantic Memory Systems: Lessons from 180+ Open-Source Experiments
Most of my ideas about semantic memory didn’t come from a whiteboard — they came from code. Over the past few years, I’ve built and maintained more than 180 public repositories on GitHub exploring different ways to structure, compress, and reuse AI context.
This post is a distillation of what has held up across those experiments: the patterns that keep showing up when you actually try to make memory work for real systems and not just demos.
Lesson 1: Memory Has to Be Structured, Not Just Stored
The first lesson is simple: “just log everything” is not a memory strategy. Raw logs might help with auditing, but they don’t help agents reason. For memory to be useful, it needs:
- Clear units (entries, events, episodes)
- Stable schemas for each unit
- Consistent labels for domain, time, and outcome
A lot of my repositories are basically variations on this theme: what if we encode memory entries like this? What if we compress them like that? What if we attach this metadata? You quickly see that structure is what makes memory usable.
Lesson 2: Reversibility is Non-Negotiable
Many of my projects (including Cube Protocol) treat reversibility as a hard requirement. If you can’t reconstruct the original data, you’re trusting that your compression or summary process never made a mistake. That’s a big assumption.
Reversibility matters because:
- You can always go back to source-of-truth if something looks wrong.
- You can debug how a particular memory entry was formed.
- You can re-embed or reprocess memory when models improve.
In code, this translates into round-trip tests: for every memory format, can we go from original → compressed → restored and get the same result plus metadata? If the answer is no, I usually treat that as a red flag.
Lesson 3: Memory is for Systems, Not Just Models
A single model might “remember” something in its context window, but true semantic memory belongs to the system, not the model. It has to be:
- Accessible to multiple agents and tools
- Stored outside any specific model call
- Routable, searchable, and auditable
In my repositories, I’m constantly experimenting with how to connect memory modules to orchestration layers, not just to models. Memory systems should serve the controller, which then decides which agent sees what and when.
Lesson 4: Semantics Need to Be First-Class
A big part of my work is making semantics explicit. In many of my semantic memory projects, every entry has:
- A domain (what area of the world it refers to)
- A sequence (what process or workflow it belongs to)
- An outcome (what result or decision it relates to)
This shows up in patterns like the DOMAIN | SEQUENCE | OUTCOME descriptor used in Cube Protocol. Those semantics are not afterthoughts — they’re part of the key.
Lesson 5: Agents Need Memory That Fits Their Role
Memory is not “one size fits all.” A retrieval agent, a planning agent, and an evaluation agent each need:
- Different slices of the same underlying memory
- Different levels of detail
- Different time horizons
Many of my experiments are about making these views explicit: full history here, summarized context there, domain-specific projection somewhere else. The system can share a core memory representation, but the views must be tailored for each agent’s job.
Lesson 6: Real Memory Systems Emerge from Iteration
None of this came out perfect on the first pass. It came from writing code, throwing it away, rewriting it, and testing it in different domains. That’s why I keep the open-source work visible — it’s the real story behind the abstractions.
If you’re curious about how these ideas evolve, the best places to explore are:
- Code – high-level view of what I publish.
- github.com/Phil-Hills – the full repo list.
- Projects – how some of these experiments solidify into protocols and systems.
The big takeaway: semantic memory is not a single library or pattern — it’s a discipline. The more you treat it like an evolving system rather than a static feature, the more powerful it becomes.