
PRISM
Skills that compound.
Knowledge agents can walk.
What we kept hitting
Five things we tried. Each one worked partially. The pattern is the diagnosis.
We shipped skills. The outputs didn’t compound.
First /idea. Then /plan, /spec, /build, /review. Each one worked. None of them carried what the last one learned.
Every session re-explained the same constraints, re-cited the same findings, re-derived the same conclusions. The skill got smarter. The work didn’t.
Skills are tools. Tools don’t remember. We needed something that did.
We assumed bigger context would fix it. The research said no.
200K, 1M tokens — surely you can fit everything that matters. We could. The model couldn’t use most of it.
Across Claude Opus/Sonnet 4, GPT-4.1, Gemini 2.5 Pro/Flash, Qwen3 — performance degrades from 500–2,500 tokens on trivial tasks. Models do worse on coherent documents than shuffled ones.
Chroma · July 2025
Strip lexical-overlap shortcuts and 10 of 12 frontier models score at half their base accuracy by 32K. Effective context length ≤ 2K for most.
arXiv:2502.05167 · February 2025
Full sources in the tier-2 bibliography.
Nominal context grew. Effective context didn’t.
We added memory. The blob wasn’t queryable.
Anthropic’s memory tool. Conversation summaries. Persisted notes. The agent could write. It couldn’t ask its own memory structured questions back.
“What decisions superseded D-019?” hits a blob and falls back to grep. “Which findings ground this proposal?” — same. Memory holds state. It doesn’t hold shape.
Storage isn’t structure. Structure is what walks.
We poured more into AGENTS.md. The token bill grew. The output didn’t.
Every session, the agent re-reads the methodology. Re-loads the conventions. Re-anchors itself. The cost compounds — in the wrong direction.
Sept 2025 · sources in tier-2 bibliography
Agents read. They barely write. Every reload of the same methodology is salience tax — you pay full price to re-anchor what the model already needed an hour ago.
Methodology-as-prose pays full price every call.
Skills need salience, not size.
The pattern across all four: we kept stacking more. More context, more memory, more docs. But the bottleneck wasn’t capacity. It was knowing what was load-bearing for this call, pulled from a noisy pile.
Salience = load-bearing for this task, right now.
Vector retrieval pulls near this string. Structured retrieval pulls load-bearing for this decision. The gap widens as context grows.
We didn’t need more. We needed shape.
PRISM is a typed index over the knowledge you already have.
Not a new database. Not a migration. A layer over your repos, docs, ADRs, specs, decisions, tickets, papers — whatever you’ve already paid to produce. PRISM adds the typing and edges your skills walk.
Right data model — supertags, fields, references. Wonderful for power users; the UI taxes everyone else into doing the schema work by hand.
Right interface — databases everyone can use. Wrong model: relations are foreign keys. One hop. No traversal grammar, no inheritance, no schema-as-code.
Brings the model to LLMs. You sketch the shape. The agent populates and walks it. Schema flips from tax to leverage.
No migration. No new place to put things. Just shape over what you already have.
Same question, two agents.
One reads. One walks. The shape decides who guesses and who knows.
? What informed D-042? $ grep -r "D-042" docs/ docs/decisions.md:147:… docs/proposals/P-003.md:… docs/context/C-007.md:… (reads three files, reconstructs the chain from prose, cites what it thinks is lineage)
Best-effort. No guarantee the chain is complete or correct.
? What informed D-042? walk(D-042, derived_from*)
Deterministic. Cited. Fast. The architecture earned its keep.
Why this works
Graphs hold shape. Ontology gives it meaning. The architecture is what compounds.
Knowledge has shape.
It’s been captured in fragments for years — files, tags, embeddings, tables. Each captures part. None captures the relationships between the parts.
| Lane | What it captures | What it doesn’t |
|---|---|---|
| Files & folders | Hierarchy — one parent per item. | Relationships between items in different folders. |
| Tags as filters | Classification — tagged X. | How tagged things relate to each other. |
| Embeddings | Similarity — vaguely like this. | Precise structural questions. |
| Relational tables | Rows + foreign keys. | Relationships are values, not objects — N hops collapse into N joins. |
All four are projections of a graph. The graph is what holds them all — plus the relationships between them.
Your fragments are already shaped like a graph. They just lack the edges.
Ontology is the grammar your agent recalls in.
RAG retrieves chunks by similarity. Ontology gives the agent a grammar — so recall is salient, not just close.
decision supersedes decision. decision derived_from proposal. proposal cites finding. The agent walks named relationships instead of guessing what’s relevant.
Data modeling captures structure. Ontology captures meaning. PRISM does both.
Salience pays in tokens too.
A typed walk replaces a full-context re-read. Across an agent workflow that’s thousands of calls a day, the cost gap compounds — in the right direction.
If 99% of the bill is what you load, then signal-per-token is the leverage. A graph walk gets you the load-bearing slice. A prose re-read gets you everything and the noise.
Salience per token is work per token.
Vector retrieval gets cheaper as embeddings improve. Structured retrieval gets cheaper as your kit’s grammar matures. Only one of those compounds.
Buy more work per token. That’s the leverage.
The architecture is the asset.
Your architecture of knowledge becomes the moat and the substrate that future AI systems will walk.
Every walk leaves typed lineage behind. Decisions, findings, constraints — all stay walkable forever. Nobody else has your specific accumulated shape.
Your kit is a versioned package: types, fields, skills, hooks. Install on a new project — the agent inherits your team’s expertise on day one.
SMEs can’t afford a data team. Ontology isn’t infrastructure — it’s the only team they can have. 30-point accuracy gap, peer-reviewed.
Build the architecture once. Earn the moat. The kit travels. The runway compounds.