PRISM

Skills that compound.
Knowledge agents can walk.

Antonio Pericao · May 26, 2026
Prepared for #TanaKnowledgeEngineers channel
pineapple_mode = FALSE

Act 00

What we kept hitting

Five things we tried. Each one worked partially. The pattern is the diagnosis.

What we kept hitting·01 / 22

We shipped skills. The outputs didn’t compound.

First /idea. Then /plan, /spec, /build, /review. Each one worked. None of them carried what the last one learned.

Every session re-explained the same constraints, re-cited the same findings, re-derived the same conclusions. The skill got smarter. The work didn’t.

Skills are tools. Tools don’t remember. We needed something that did.

What we kept hitting·02 / 22

We assumed bigger context would fix it. The research said no.

200K, 1M tokens — surely you can fit everything that matters. We could. The model couldn’t use most of it.

Two findings the deck is built on

Context Rot

Across Claude Opus/Sonnet 4, GPT-4.1, Gemini 2.5 Pro/Flash, Qwen3 — performance degrades from 500–2,500 tokens on trivial tasks. Models do worse on coherent documents than shuffled ones.

Chroma · July 2025

NoLiMa

Strip lexical-overlap shortcuts and 10 of 12 frontier models score at half their base accuracy by 32K. Effective context length ≤ 2K for most.

arXiv:2502.05167 · February 2025

Full sources in the tier-2 bibliography.

Nominal context grew. Effective context didn’t.

What we kept hitting·03 / 22

We added memory. The blob wasn’t queryable.

Anthropic’s memory tool. Conversation summaries. Persisted notes. The agent could write. It couldn’t ask its own memory structured questions back.

“What decisions superseded D-019?” hits a blob and falls back to grep. “Which findings ground this proposal?” — same. Memory holds state. It doesn’t hold shape.

Storage isn’t structure. Structure is what walks.

What we kept hitting·04 / 22

We poured more into AGENTS.md. The token bill grew. The output didn’t.

Every session, the agent re-reads the methodology. Re-loads the conventions. Re-anchors itself. The cost compounds — in the wrong direction.

September 2025 · Claude 4 Sonnet, daily token mix

~100B

tokens / day

99%

input (read)

generated (written)

Sept 2025 · sources in tier-2 bibliography

Agents read. They barely write. Every reload of the same methodology is salience tax — you pay full price to re-anchor what the model already needed an hour ago.

Methodology-as-prose pays full price every call.

What we kept hitting·05 / 22

Skills need salience, not size.

The pattern across all four: we kept stacking more. More context, more memory, more docs. But the bottleneck wasn’t capacity. It was knowing what was load-bearing for this call, pulled from a noisy pile.

Salience = load-bearing for this task, right now.

Vector retrieval pulls near this string. Structured retrieval pulls load-bearing for this decision. The gap widens as context grows.

We didn’t need more. We needed shape.

Introducing PRISM·06 / 22

PRISM is a typed index over the knowledge you already have.

Not a new database. Not a migration. A layer over your repos, docs, ADRs, specs, decisions, tickets, papers — whatever you’ve already paid to produce. PRISM adds the typing and edges your skills walk.

The data model existed. Until LLMs, the human had to populate it.

Tana

Right data model — supertags, fields, references. Wonderful for power users; the UI taxes everyone else into doing the schema work by hand.

Notion

Right interface — databases everyone can use. Wrong model: relations are foreign keys. One hop. No traversal grammar, no inheritance, no schema-as-code.

PRISM

Brings the model to LLMs. You sketch the shape. The agent populates and walks it. Schema flips from tax to leverage.

No migration. No new place to put things. Just shape over what you already have.

Watch it walk·07 / 22

Same question, two agents.

One reads. One walks. The shape decides who guesses and who knows.

Generic agent

? What informed D-042?

$ grep -r "D-042" docs/
docs/decisions.md:147:…
docs/proposals/P-003.md:…
docs/context/C-007.md:…

(reads three files,
 reconstructs the chain
 from prose, cites
 what it thinks is lineage)

Best-effort. No guarantee the chain is complete or correct.

Shaped agent

? What informed D-042?

walk(D-042, derived_from*)

D-042 · decision

derived_from ↓

P-003 · proposal

cites ↓

C-007 · context

grounds ↓

F-018 · finding

Deterministic. Cited. Fast. The architecture earned its keep.

Act 01

Why this works

Graphs hold shape. Ontology gives it meaning. The architecture is what compounds.

Why this works·08 / 22

Knowledge has shape.

It’s been captured in fragments for years — files, tags, embeddings, tables. Each captures part. None captures the relationships between the parts.

Lane	What it captures	What it doesn’t
Files & folders	Hierarchy — one parent per item.	Relationships between items in different folders.
Tags as filters	Classification — tagged X.	How tagged things relate to each other.
Embeddings	Similarity — vaguely like this.	Precise structural questions.
Relational tables	Rows + foreign keys.	Relationships are values, not objects — N hops collapse into N joins.

All four are projections of a graph. The graph is what holds them all — plus the relationships between them.

Your fragments are already shaped like a graph. They just lack the edges.

Why this works·09 / 22

Ontology is the grammar your agent recalls in.

RAG retrieves chunks by similarity. Ontology gives the agent a grammar — so recall is salient, not just close.

decision supersedes decision. decision derived_from proposal. proposal cites finding. The agent walks named relationships instead of guessing what’s relevant.

Measured · ontology-grounded RAG vs. plain RAG, across four LLMs

+55%

recall of accurate facts

+40%

response correctness

+30%

faster attribution to source

+27%

fact-based reasoning

Data modeling captures structure. Ontology captures meaning. PRISM does both.

Why this works·10 / 22

Salience pays in tokens too.

A typed walk replaces a full-context re-read. Across an agent workflow that’s thousands of calls a day, the cost gap compounds — in the right direction.

Measured · structured retrieval as a token-efficiency lever

~6,000×

LightRAG: <100 tokens per query vs. ~610K for GraphRAG — same or higher win rate on domain QA.

84%

Anthropic Memory tool: token reduction on a 100-turn agentic web-search eval. +39% accuracy with context editing.

99%

Sept 2025: ~100B Claude 4 Sonnet tokens/day, almost entirely input. The cost is in what you load.

If 99% of the bill is what you load, then signal-per-token is the leverage. A graph walk gets you the load-bearing slice. A prose re-read gets you everything and the noise.

Salience per token is work per token.

Vector retrieval gets cheaper as embeddings improve. Structured retrieval gets cheaper as your kit’s grammar matures. Only one of those compounds.

Buy more work per token. That’s the leverage.

Why this works·11 / 22

The architecture is the asset.

Your architecture of knowledge becomes the moat and the substrate that future AI systems will walk.

It compounds · the moat

Every walk leaves typed lineage behind. Decisions, findings, constraints — all stay walkable forever. Nobody else has your specific accumulated shape.

It travels · the kit

Your kit is a versioned package: types, fields, skills, hooks. Install on a new project — the agent inherits your team’s expertise on day one.

It earns its keep · at scale

SMEs can’t afford a data team. Ontology isn’t infrastructure — it’s the only team they can have. 30-point accuracy gap, peer-reviewed.

Build the architecture once. Earn the moat. The kit travels. The runway compounds.

Loading shape…