MaxyDocs
View as markdown →

Platform Internals — Retrieval Architecture

Technical architecture reference for the retrieval pipeline, knowledge delivery, and supporting infrastructure. This document covers how information moves from the Neo4j graph into an agent's context — the mechanics behind "Maxy searches this graph to retrieve relevant context."

Use this reference when assessing capabilities, diagnosing retrieval behaviour, or answering questions about how the platform works internally. When a question asks "does Maxy have X?" — check here before asserting a gap.


Retrieval Pipeline Overview

Every knowledge query flows through a hybrid search pipeline that combines semantic similarity with keyword matching, applies layered access controls, expands results via graph traversal, and optionally re-ranks via LLM reasoning.

QUERY  ── (retrievalClass from Task 304 gateway-classifier)
  │
  ├── EXPAND (Haiku — 3-5 paraphrases, 1h cache)            [flag: MAXY_GS_EXPANSION]
  │
  ├── ROUTE  (per-class label filter + fusion weights)      [flag: MAXY_GS_ROUTE]
  │
  ├── For each query ────► EMBED ──► VECTOR SEARCH ──┐
  │                                                  ├─► FUSE (weighted-sum or RRF) [flag: MAXY_GS_RRF]
  │                  └────► BM25 FULL-TEXT ──────────┘
  │                         (entity_search — universal coverage)
  │
  ├── BOOST  (compiledTruth +15%, backlinks log 5-25%)      [flag: MAXY_GS_BOOSTS]
  ├── DEDUP  (4 layers: nodeId, slug, canonicalName, hash)  [flag: MAXY_GS_DEDUP]
  ├── THRESHOLD + SORT + SLICE
  └── GRAPH EXPAND ──► RESULTS

Fusion (default / weighted-sum): combined = 0.7 × vector + 0.3 × bm25_norm
Fusion (RRF):                    score = Σ 1 / (60 + rank_i) across ranked lists
Fallback: if the full-text index doesn't exist, vector-only results are returned (graceful degradation, no error).

Each Task 308 enhancement is independently flagged. All flags default OFF — the unflagged pipeline is identical to the baseline weighted-sum + nodeId-only-dedup behaviour. Tasks 305 (typed-edge backlinks) and 306 (compiledTruth property) have landed, so the boost data is populated; flag activation, soak windows, and per-flag measurement live under Task 337.

Hybrid Search Detail

Vector path: The query is embedded via Ollama (model per EMBED_MODEL env var, default nomic-embed-text). The resulting vector is compared against Neo4j's HNSW cosine indexes — one per indexed label. Dimensions are configured at install time (default 768). The search runs against all discovered indexes (or a subset if the caller specifies label filters). Scores are in [0, 1] (cosine similarity).

BM25 path: The raw query text is escaped for Lucene special characters and run against the entity_search full-text index (earlier platform fixes — universal coverage), which spans every operator-meaningful label written by the platform on the canonical text-property union (~28 properties: name, firstName, lastName, givenName, familyName, title, summary, body, content, description, headline, email, subject, bodyPreview, etc.). Pre-Task-748 the index was named knowledge_fulltext and covered only KnowledgeDocument | Section | Chunk — that gap silently hid Person/Organization/Task/Event/etc. from BM25 regardless of query. Raw BM25 scores are in [0, infinity) — they are normalised to [0, 1] via min-max scaling within the result set before merging. When all scores are equal (or a single result), all normalise to 1.0.

Merge: Results from both paths are collected in a single map keyed by nodeId. A node appearing in both paths accumulates the max vector score and max BM25 score independently. The combined score is 0.7 * vectorScore + 0.3 * bm25Score. Results are sorted descending by combined score, then sliced to the requested limit (default 10).

Task 308 enhancements (flagged, default off)

StageModuleFlagWhat it does
Routingroute.tsMAXY_GS_ROUTEPicks per-class label filter + fusion weights from the retrievalClass hint produced by Task 304's gateway-classifier. entity → vector-heavy + Person/Company/Concept; temporal → BM25-heavy over Event; event → BM25-only over Event; general → balanced; none → skip the lookup.
Multi-query expansionquery-expansion.tsMAXY_GS_EXPANSIONHaiku generates 3-5 paraphrases per query; each runs through vector + BM25 in parallel, with results unioned before fusion. Per-call 1-hour cache keyed by (accountId, query, retrievalClass). Graceful degrade on Haiku failure — original query only.
RRF fusionrrf-fusion.tsMAXY_GS_RRFReplaces weighted-sum with Reciprocal Rank Fusion (k=60 by default). Sums 1 / (k + rank) per node across the ranked lists each pass produces. More robust to score-distribution drift between indexes than weighted-sum. Weighted-sum stays as the fallback.
compiledTruth boostboosts.tsMAXY_GS_BOOSTS+15% to the combined score of any hit whose node carries a non-null compiledTruth property (populated by Task 306 on Person/Company/Concept). The property is in the entity_search index so BM25 hits against summary text are also matched.
Backlink boostboosts.tsMAXY_GS_BOOSTSbump = clamp(0.05 + 0.05 × log10(backlinkCount), 0.05, 0.25). 1 backlink → +5%; 10 → +10%; 100 → +15%; 1000+ → +20%; capped at +25%. Reads backlinkCount populated by Task 305's typed-edge hook.
4-layer dedupdedup.tsMAXY_GS_DEDUPStrict superset of nodeId-only dedup. Layers: nodeId, slug, canonicalName (case-insensitive, falls back to name), contentHash (sha256 of `compiledTruth

A per-call log line lets the operator see which stages ran with which counts:

[graph-search:hybrid] accountId=<8c> retrievalClass=<c> expansions=<n> vector=<n> bm25=<n> fused=<n> boosted=<n> deduped=<n> final=<n> mode=<hybrid|rrf|bm25> ms=<ms> expand-ms=<ms>

What the hybrid approach catches

Vector search excels at semantic meaning — "how do I contact someone" finds nodes about communication even if the word "contact" doesn't appear. BM25 excels at exact terms — invoice numbers, product codes, proper nouns, technical identifiers. The hybrid combination ensures both modes contribute, with semantic similarity weighted higher (0.7) because most user queries are natural language.


Embedding Infrastructure

PropertyValue
ModelDefault nomic-embed-text (via Ollama at localhost:11434), configurable at install time via --embed-model
DimensionsDefault 768, configurable at install time (resolved from model lookup table or --embed-dimensions)
Similarity functionCosine
Index algorithmHNSW (approximate nearest-neighbor)
Configurable viaEMBED_MODEL and EMBED_DIMENSIONS env vars (set by installer in ~/{configDir}/.env), OLLAMA_URL

Indexed node labels

Every searchable node type has its own vector index. The memory-search tool discovers indexes at runtime via SHOW INDEXES and caches the label-to-index mapping. This means new index definitions in schema.cypher become searchable automatically without code changes.

Indexed labels: Question, DefinedTerm, Review, Service, Person, LocalBusiness, PriceSpecification, Task, CreativeWork, DigitalDocument, KnowledgeDocument (includes email threads via source:'email' since Task 321), Section, Chunk, Conversation, Message, Event, Workflow, Preference (18 labels total).

Full-text index

Index nameLabelsPropertiesPurpose
entity_searchAll operator-meaningful labels (~40, see schema.cypher)Canonical text-property union (~28)Universal BM25 keyword matching across the whole graph

Embedding lifecycle

Embeddings are computed when nodes are created or updated (via memory-write, memory-ingest, or any tool that persists to Neo4j). If Ollama is unavailable at write time, nodes are stored without embeddings. The memory-reindex tool backfills missing embeddings by iterating nodes where embedding IS NULL, calling Ollama's /api/embed endpoint, and storing the resulting vector. Batch embedding is supported for efficiency.


Knowledge Document Hierarchy

Large documents are decomposed into a three-level hierarchy for granular retrieval:

KnowledgeDocument
  ├── summary (embedded) — document-level semantic anchor
  ├── Section
  │ ├── summary (embedded) — section-level semantic anchor
  │ └── Chunk
  │ ├── summary (embedded) — chunk-level semantic anchor
  │ └── content (raw text, BM25-indexed) — full content for retrieval
  └── attachmentId — links back to the source file

All three levels are independently vector-indexed and BM25-indexed. A query may match at the document level (broad topic), section level (sub-topic), or chunk level (specific passage). Graph expansion from a matched chunk retrieves its parent section and document for context.

Semantic chunking

Documents are split by a semantic chunker that identifies topic boundaries rather than using fixed character counts. Each chunk gets a summary (used for embedding) and retains the raw content (used for BM25 and for returning to the agent).


Response-side fields projection

memory-search accepts an optional fields: string[] that narrows the properties returned on each row to the caller-named keys. This is a read-side payload trim only — it runs after hybrid() returns, so vector search, BM25, keyword subscriptions, and graph expansion all see the full text. Ranking does not change.

  • fields omitted → today's behaviour (every property except embedding).
  • fields: ["name", "slug"] → only those keys per row.
  • fields: [] → empty properties object — explicit "no properties".
  • Unknown keys are silently skipped. Rows lacking a requested key omit it on that row.
  • related[*].properties is NOT projected (separate concern).

Use this when the caller knows which keys it needs (slug → name, Person → phoneNumber). It is the safe alternative to write-time summarisation, which is lossy: write-time pruning has no way to know which keys a future query will want.

Observability: when fields is set, memory-search.ts writes [memory-search] accountId=… fields=… returnedProps=N droppedProps=N to stderr. droppedProps=0 across many calls with fields set is a diagnostic signal — either the schema has already been narrowed upstream, or callers are requesting every field and defeating the purpose.

Guard Layers

Every query path — vector search, BM25 search, keyword subscriptions, and graph expansion — applies a consistent set of access control filters. These are Cypher WHERE clauses, not middleware, so they cannot be bypassed by tool parameter manipulation.

Layer 1: Soft-delete filter

WHERE node.deletedAt IS NULL

Unconditional. No parameter controls it. Nodes with a deletedAt timestamp are excluded from all query paths. Soft-deleted KnowledgeDocument nodes cascade the timestamp to all child Section and Chunk nodes. Grace period before hard deletion: 7 days. Re-ingesting a soft-deleted document (same attachmentId) clears deletedAt and replaces the hierarchy.

Layer 2: Scope filter

WHERE (node.scope IS NULL OR node.scope IN $allowedScopes)

When allowedScopes is set (e.g., ["public", "shared"] for public agents), only nodes with a matching scope property — or no scope at all (legacy transitional safety) — are returned. When allowedScopes is omitted (admin agent), no scope filtering is applied. Scope values: public, shared, admin.

Layer 3: Per-agent tag filter

WHERE node.agents IS NOT NULL AND $agentSlug IN node.agents

When agentSlug is set (public agent queries), only nodes explicitly tagged for that agent are returned. The agents property is a string array on each node — a node is visible to an agent only if the agent's slug appears in this array. No implicit "available to all" fallback. This is enforced at the MCP server level via the AGENT_SLUG environment variable — tool parameter overrides are rejected when the env var is set.

Defense in depth: Both scope and agent filters must pass. An admin-scoped node tagged for a public agent is still invisible to that agent because the scope filter rejects it first.

Layer 4: Graph expansion enforcement

Related nodes discovered during hop traversal are independently filtered:

WHERE (related.scope IS NULL OR related.scope IN $allowedScopes)
AND (related.agents IS NULL OR $agentSlug IN related.agents)
AND related.deletedAt IS NULL

This prevents cross-agent content leakage via graph traversal — a public agent cannot reach admin-scoped nodes by following relationships from a public node. Untagged related nodes (no agents property) pass through, allowing shared structural nodes (e.g., PriceSpecification linked to a Service) to be discoverable.

Layer 5: Account isolation

WHERE node.accountId = $accountId

Multi-tenancy boundary. Every query is scoped to the requesting account. The ACCOUNT_ID environment variable is set at MCP server startup — it is not a tool parameter and cannot be overridden by the agent.

The read filter alone is not sufficient — it correctly hides alien-account nodes from every UI but does not prevent them existing. A writer that misresolves accountId (literal, undefined, or inferred-from-the-wrong-context) leaks nodes into the graph with no downstream symptom; the read filter then keeps them invisible indefinitely. The write-side doctrine is documented in .docs/neo4j.md "Account isolation invariant" — every writer that stamps n.accountId must verify the value against ${DATA_ROOT}/accounts/<id>/account.json before write. The live floor is writeNodeWithEdges — every doctrine-primitive write is gated by an accountId == process.env.ACCOUNT_ID check (the spawning process validates ACCOUNT_ID at boot against the on-disk account set via the account-enumeration lib), with [graph-write] reject reason=invalid-account-id … as the rejection signal.

Two boot-time surfaces stamp + validate the env (added 2026-05-07). The brand systemd unit emits Environment=ACCOUNT_ID=<uuid> (resolved by the installer from INSTALL_DIR/data/accounts/<uuid>/account.json); the Hono boot path then calls validateAccountIdEnv against the on-disk set and emits [graph-health] account-id-env present=true id=<8> matches-on-disk=true on success or [graph-health] account-id-env FATAL reason=<missing|no-on-disk-account|mismatch> + process.exit(1) on failure. No fallback — a misconfigured Pi cannot silently boot.


Query Classification

Before searching, a Haiku classifier decides whether a query needs knowledge retrieval at all. This prevents meta-queries ("hello", "thanks", "continue") from polluting the system prompt with irrelevant search results.

PropertyAdmin variantPublic variant
Modelclaude-haiku-4-5Same
Timeout3 secondsSame
History windowLast 4 messages (2 user + 2 assistant)Same
Max tokens200120
Query rewritingYes — resolves references from history into concrete search termsSame
Topic-change detectionYes — detects shifts with confidence scoreNo (removed, earlier platform fixes)
Fallback on failuresearch: true (always search with raw message)Same

Classification output

The classifier returns a JSON object:

  • search (boolean) — whether a knowledge search should run
  • query (string or null) — a search-optimised rewrite of the user's message, or null to use the raw message
  • reason (string) — short explanation of the decision

When search is true and query is non-null, the rewritten query replaces the raw message for the memory-search call. This is important: the classifier resolves pronouns and references from conversation history into concrete terms, improving retrieval precision.

Knowledge retrieval gate

On the public PTY surface the agent itself decides when to call memory-search — there is no server-side classifier interposed between the user message and the agent's first tool call. KNOWLEDGE.md (when present) is assembled into the agent's system prompt at spawn time. Whether memory-search is reachable at message time is controlled by the agent's liveMemory config flag: when true, the per-spawn allowlist includes memory-search and reads run with ALLOWED_SCOPES=public; when false, the agent has no graph access mid-turn.

Observability

Admin: [admin-query-classifier] log line with topicChange, topicChangeConfidence, existingTopic, latencyMs.

Public: [public-query-classifier] log line with search, effectiveQuery, reason, latencyMs. The intentional absence of topic-change fields in the public log is the on-disk evidence that the public path does less work.



Reports — durable workflow output (Task 332)

The :Report label is the platform's durable shape for workflow output the operator may want back later — daily briefings, dream cycle runs, ad-hoc analyses. Three MCP tools own the surface, all on the memory plugin:

  • memory-report-write — append-only writer. Validates body ≤ 10,000 chars, embeds title+body, and CREATEs a :Report node. Idempotent on (accountId, title, occurredAt-within-same-minute) — a second call with the same title in the same minute returns the existing node instead of duplicating. Parented to the active :Conversation via :PRODUCED when SESSION_NODE_ID is set (the chat-driven default); falls back to the account's :AdminUser so the graph-hierarchy doctrine holds even outside a conversation.
  • memory-report-read-latest — fetches the newest :Report (default limit=1) tagged with a given keyword. The expected route for any operator phrasing of "latest X", "last night's X", "show me X report".
  • memory-report-list — metadata-only paginated listing (newest first), with optional keyword and sourceWorkflow filters. Use to scan the catalogue without paying for full bodies.

Every operation emits one log line: [reports] op=<write|read-latest|list> reportId=<short> keywords=<csv> ms=<n> (with idempotent=1 on a write that resolved to an existing node, hits=<n> on reads, total=<n> on list).

Routing is not classifier-side. The admin agent's IDENTITY.md carries the rule under Recalling reports: "latest <X>" / "last night's <X>" / "show me <X> report" → first tool call is memory-report-read-latest. The intent classifier (Task 304's retrievalClass) already differentiates temporal vs entity vs event reads; reports route off the literal phrase, not a new class.

The first caller is the briefing skill (platform/plugins/scheduling/skills/briefing/SKILL.md), which persists each run as a :Report with title: "Daily briefing <YYYY-MM-DD>", keywords: ["daily-briefing", "<YYYY-MM-DD>"], sourceWorkflow: "daily-briefing". Dream-cycle (Task 327) and ad-hoc analyses are expected to follow the same pattern.


Graph Expansion

After the top results are selected (by combined score or by LLM ranking), each result node is expanded by traversing its immediate relationships.

Traversal mechanics

MATCH (n)-[r]-(related)
WHERE elementId(n) = $nodeId
AND related.deletedAt IS NULL
AND (related.scope IS NULL OR related.scope IN $allowedScopes)
AND (related.agents IS NULL OR $agentSlug IN related.agents)
RETURN type(r), direction, labels(related), related
LIMIT 20
  • Default hop depth: 1 (immediate relationships only)
  • Related nodes cap: 20 per primary result
  • Direction tracking: Each relationship is labelled outgoing or incoming
  • Scope enforcement: All guard layers (soft-delete, scope, agent) apply to related nodes
  • Configurable: expandHops: 0 produces compact output (properties only, no related nodes) — useful for listing/inventory queries

What expansion provides

A Service node matched by vector search will have its PriceSpecification, Review nodes, and parent LocalBusiness attached as related nodes. A Chunk matched by BM25 will have its parent Section and KnowledgeDocument. This context enrichment means the agent receives not just the matched node but its immediate neighbourhood in the graph.


Keyword Subscriptions — Reactive Per-Agent Knowledge

Each public agent can subscribe to up to 5 keywords via knowledgeKeywords in its config.json. These subscriptions make the agent reactive to new graph content matching its topics — content added after the agent was created becomes discoverable without manual tag updates.

Dual search per keyword

For each subscription keyword, two complementary searches run:

  1. BM25 full-text search — queries the universal entity_search index with the keyword as the search term. Catches content that mentions the keyword in its text across every operator-meaningful label.

  2. Property-based search — finds nodes whose keywords array property contains the subscription keyword (case-insensitive). Catches nodes explicitly tagged with that keyword topic. These matches are boosted to maximum BM25 score (1.0) since they are exact tag matches.

Both searches run without the per-agent tag filter (agentSlug) — keyword subscriptions are scope-inclusive by design, meaning an agent's subscriptions can discover content not directly tagged for it. The scope filter (allowedScopes) still applies as defense in depth — admin-only content remains invisible to public agents regardless of keyword matches.

Union semantics

Results from keyword subscription searches are merged into the same scored map as the primary vector+BM25 results. Deduplication by nodeId with Math.max on scores means a node found by both direct search and keyword subscription keeps the highest score from each method.

Lifecycle

Keywords are consumed by the update-knowledge admin skill when regenerating KNOWLEDGE.md — the regeneration query broadens the operator-tagged set with keyword matches so newly-added graph content that shares a subscribed topic lands in the next baked snapshot. There is no runtime keyword-injection path on the public PTY surface.


Separate from the knowledge retrieval pipeline, conversation-search provides semantic search over past messages.

  • Index: message_embedding (768-dim cosine HNSW on Message nodes)
  • Scope: When SESSION_ID is set (public agent), results are limited to that conversation. Admin searches all conversations.
  • Output: Messages with role, content, timestamp, and relevance score.

This tool is read-only and available to both public and admin agents.

When conversations are created

:Conversation nodes on webchat (admin login, "New conversation" in the burger, a new public visitor) are created lazily. Opening the chat or logging in does not write anything to the graph — Maxy only records the conversation once the user sends a second message. This keeps conversation-search and the Conversations modal free of one-turn abandoned threads. WhatsApp and Telegram take the opposite posture: every inbound — DM or group, allowed or activation-off, agent-invoked or gated — MERGEs the :Conversation and writes a forensic :Message:WhatsAppMessage row before any access-control decision. The graph is the durable record of every message the device received, not just the ones the agent replied to. See .docs/web-chat.md "Deferred conversation persistence" and .docs/whatsapp.md "Session continuity" for the full contract.

Each row in the Conversations modal exposes a View logs row-action that opens a popover with three links — Stream, Errors, SSE — each of which targets /api/admin/logs?type={stream|error|sse}&sessionId={full-id} in a new tab. The row's 8-char id chip is click-to-copy; hover reveals the full sessionId as a tooltip. See .docs/web-chat.md "In-chat retrieval" for the route contract and console.debug observability.

Static publish surface — /sites/*

Maxy hosts a generic per-account static-tree publish surface at https://public.<brand>/sites/<...>/<file>. The route serves files from <accountDir>/sites/<...> with URL=disk mirroring — operator drops the tree on disk, no upload API. Extended MIME covers HTML/CSS/JS/woff2/fonts on top of images. Path traversal (.., encoded .., segments failing SAFE_SEG_RE) returns 403; symlinks escaping the sites root are rejected via a realpathSync re-check. .html responses carry Content-Security-Policy: default-src 'self' https: data:; script-src 'none' and Cache-Control: no-cache; assets are cached for an hour; every response carries X-Content-Type-Options: nosniff. Per-account isolation comes from resolveAccount — every brand's install sees only its own tree.

Directory canonicalisation. A request whose disk target is a directory is 301'd to the trailing-slash form (query string preserved) before any body is served — RFC 3986 §5.3 base resolution requires the trailing slash so relative refs in the served HTML resolve under the directory, not its parent. After the redirect the route serves <dir>/index.html if it exists on disk; otherwise 404. There is no implicit-index.html invention for missing paths — the publisher owns canonical URLs. A brochure shipped without index.html is reached at /sites/<slug>/<file>.html, and the admin skill publish-site is the sanctioned surface that moves the extracted tree under <accountDir>/sites/<slug>/ and emits the canonical path slug. Operator-side: drop a brochure at <accountDir>/sites/properties/<id>/brochure/output/ and it serves at <public-host>/sites/properties/<id>/brochure/output/brochure.html (or <public-host>/sites/properties/<id>/brochure/output/ if that directory contains an index.html). See .docs/web-chat.md /sites/* route entry for the wire contract and [sites] log lines (serve|redirect-trailing-slash|not-found|path-traversal-rejected|symlink-escape-rejected|no-account).

Deterministic public-hostname surface. The <public-host> half of the URL the operator pastes is resolved by the mcp__plugin_admin_admin__public-hostname MCP tool. It reads <configDir>/cloudflared/config.yml (ingress list) then falls back to <configDir>/alias-domains.json — the same two files cloudflared and platform/ui/server/index.ts's isPublicHost() already trust to route. Returns {hostname, isApex, source} on hit (source is "cloudflared-config.yml" or "alias-domains.json"), or {hostname:null, source:null, reason:"no-tunnel"} on miss. Tiebreak: apex wins over subdomain (single-label, or www.<apex> stripped). publish-site step 6 calls it after the move and emits the full URL (https://<hostname><path-slug>) in the same turn. Graph queries are no longer involved — any earlier graph-backed resolver returned (none) on accounts bootstrapped without cloudflare-task-tracker.ts writes (laptop Real Agent, manual cloudflared setup), the llm-framing-deterministic recurrence class. The graph-mcp shim additionally runs a sequential envelope-warning probe on every read response — when Neo4j emits gql_status codes matching ^0[12]N5\d$ (e.g. 01N52 "property does not exist"), the shim stitches them into a prefix content block on the response so property-name misses surface to the agent inline instead of returning silent []. Probe failure is best-effort: the upstream response forwards unchanged with [mcp:graph] probe-error.

Cross-tab session rotation

When you click "New conversation" in the chat tab, Maxy mints a fresh admin session key on the server and clears the old one. Sibling admin tabs (/graph, /data) opened in the same browser keep working without re-login: the chat tab broadcasts the new key on a same-origin channel so each sibling tab updates its captured key instantly, and any in-flight admin request that 401s with the rotation-orphan code retries once after re-reading the latest key from per-tab storage. If neither path recovers (browser locked down, second 401 after retry, session expired), the tab shows a single banner — "Your admin session was renewed in another tab. Click to reload." — and one click sends you back through login. No silent 401s; no re-clicking through the same trash icon hoping it sticks. See .docs/web-chat.md "Cross-tab rotation contract" for the wire-level code taxonomy and observability surfaces.


Context Assembly — How Retrieved Knowledge Reaches the Agent

The final step in the retrieval pipeline is injecting retrieved content into the agent's system prompt. The path depends on agent configuration.

Public agent paths

Public agents run on the same native Claude Code PTY surface as the admin, dispatched through the channel PTY-bridge with role: 'public'. The agent's directory files (IDENTITY.md, SOUL.md, KNOWLEDGE.md, KNOWLEDGE-SUMMARY.md when present) are assembled into the system prompt at spawn time. There is no per-turn server-side knowledge injection.

Two paths, selected by the agent's liveMemory config flag:

  • liveMemory: falsememory-search is excluded from the per-spawn --allowed-tools allowlist. The agent has no graph access mid-conversation; KNOWLEDGE.md is the ceiling of factual knowledge.
  • liveMemory: truememory-search is in the allowlist. The agent decides at message time whether to call it; reads run against the graph with ALLOWED_SCOPES=public so only public-scoped nodes return. KNOWLEDGE.md and the live memory-search surface are complementary — the baked file covers evergreen facts; the live tool covers the long-tail public-scoped lookups.

KNOWLEDGE.md staleness guard

When both KNOWLEDGE.md and KNOWLEDGE-SUMMARY.md exist, the server compares modification times. If KNOWLEDGE.md is newer than the summary (summary is stale), the full KNOWLEDGE.md is used. Otherwise, the summary is preferred (smaller token footprint).

Admin agent path

The admin agent runs via Claude Code CLI, which manages its own system prompt assembly. Knowledge reaches the admin agent through MCP tools — memory-search is the read-path entry point (server-side LLM ranking was removed by Task 424; the agent ranks in-turn against any criterion). The admin agent also receives session context via loadSessionContext, which injects:

  • Recent review digest (last public chat or review digest CreativeWork)
  • Open tasks (priority-ordered, capped)
  • Active review alerts (unsuppressed, last 24 hours, capped at 5)

This is assembled as a <previous-context> block in the system prompt on each admin turn.

fetchMemoryContext — the MCP bridge

For public agents, the server calls the memory MCP server via JSON-RPC over stdin/stdout:

  1. Spawn the memory MCP server as a subprocess with environment variables: ACCOUNT_ID, ALLOWED_SCOPES=public,shared, AGENT_SLUG, KNOWLEDGE_KEYWORDS, SESSION_ID
  2. Send initialize + tools/call (name: memory-search, arguments: {query, account_id})
  3. Read the tool result text
  4. Timeout: 8 seconds. On any failure, returns null — the agent proceeds without memory context.

This subprocess model means each public agent query gets an isolated, short-lived memory server instance with the correct scope constraints baked into its environment.


Output Formatting and Budget

The memory-search tool formats results as structured text with labels, properties, scores, and related nodes. An output character budget of 80,000 characters prevents results from exceeding Claude Code's tool result token limit (~100K chars). When results exceed the budget, related nodes are progressively dropped (compact mode) to fit within the limit.

Each result is formatted as:

[Label1, Label2] (id: nodeId) (score: 0.XXX)
  property1: value
  property2: value
  Related:
    --[RELATIONSHIP]--> [RelatedLabel] {prop1: val, prop2: val}
    <--[RELATIONSHIP]-- [RelatedLabel] {prop1: val, prop2: val}

Results are separated by --- dividers. The embedding and accountId properties are stripped from output (internal fields, not useful to the agent).


Index Discovery and Schema Evolution

The memory MCP server does not hardcode index names. On first query, it runs SHOW INDEXES YIELD name, labelsOrTypes, type WHERE type = 'VECTOR' and builds a label-to-index-name map. This map is cached for the lifetime of the process.

This means:

  • Adding a new vector index in schema.cypher makes a new label searchable without code changes
  • The memory-reindex tool can backfill embeddings for newly indexed labels
  • Index renames are transparent — the server discovers the current index names at startup

The cache is cleared via clearIndexCache after schema changes (e.g., after memory-reindex detects new indexes).


Inbound Message Gateway

Every inbound message — regardless of channel (web admin, web public, WhatsApp DM, WhatsApp group) — passes through a centralised screening and classification step before reaching the agent. One Haiku call per message produces:

  • Content screening — CLEAN / SUSPICIOUS / DISCARD verdict plus a prompt injection flag. DISCARD verdicts on public channels return a polite refusal without invoking the agent. Admin messages receive advisory screening only — flagged in the log but never blocked or modified.
  • Query rewriting — retrieval-optimised reformulation of the message for memory-search (public channels only; admin text is unchanged).
  • Intent classification — question / instruction / complaint / greeting / follow-up.
  • Language — ISO 639-1 code.
  • Complexity — simple / complex.

Short messages (under 5 words) skip the Haiku call but still get local pattern matching against the shared prompt injection vocabulary — this prevents short injection payloads from bypassing screening.

On Haiku timeout, API error, or missing API key, the raw message passes through unmodified (graceful degradation). The gateway never blocks the user from reaching the agent due to its own failure.

Gateway results are injected into the agent's system prompt as structured metadata, giving the agent context about the message before it begins processing.

Diagnostics

Every gateway invocation logs to server.log with the [inbound-gateway] tag, including channel, verdict, intent, language, complexity, latency, and fallthrough status. Non-clean verdicts get an additional warning log.

To check recent screening activity:

grep '[inbound-gateway]' server.log | tail -20

Tool Eagerness — eager-load vs deferred

The Claude Code SDK marks every MCP tool as deferred by default. The model cannot invoke a deferred tool until it has first paid a ToolSearch round-trip to load the schema — one extra turn per unique schema. Built-in SDK tools (Read, Write, Edit, Bash, Glob, Grep, Agent, WebSearch, WebFetch) stay eager. There is no count threshold; the gate is per-tool.

The SDK's per-tool override is _meta["anthropic/alwaysLoad"]: true on each MCP tool's tools/list entry. Two surfaces apply it:

  1. In-process plugins. Every admin-eager tool is registered via eagerTool(server, name, description, inputSchema, handler) from platform/lib/mcp-eager/ instead of server.tool(...). The helper calls server.registerTool with the _meta flag set.
  2. Upstream graph proxy. The upstream Python mcp-neo4j-cypher server has no _meta channel, so platform/lib/graph-mcp/src/index.ts intercepts every tools/list response on the wire and injects _meta["anthropic/alwaysLoad"]: true into each tool entry. The [graph-mcp] tools/list eager-flagged count=<N> stderr line confirms the injection fired.

Curation rule. Every MCP tool the admin agent calls routinely should be eager — registered via eagerTool (or arriving through the graph-mcp interceptor). Whether a tool is eager is decided at its registration site in the plugin's MCP index.ts (eagerTool vs server.tool); there is no separate allow-list constant. Admin-skill / specialist / public-agent tools that stay on server.tool() pay the ToolSearch tax only when their caller invokes them. The admin tool surface (toolSurface.admin, the adminAllowlist: true set) is the intended eager set; a routinely-called admin tool left on server.tool() is a gap to fix at the registration site.

Observability. Spawn-time emit: [tool-surface] session=<convId> permission_allowed=N eager_intent=E eager_set_size=T. Turn-end emit: [admin-agent] turn-end ... toolsearch=N toolsearch_unique=U. A non-zero toolsearch on a fresh turn for an eager-intended tool means a plugin reverted to server.tool() — fix at the plugin's MCP registration site, not the allow-list.

Spawn-time MCP and subagent registration

Each claude PTY spawn registers every callable MCP server and every dispatchable subagent before the operator's first turn. Platform MCP servers come from one channel — installed plugins — for admin and specialist spawns (Task 502). Claude Code's plugin system serves every plugin MCP tool under the long prefix mcp__plugin_<plugin>_<server>__<tool> (for platform plugins plugin == server == directory), which is the canonical name the admin --allowed-tools argv and every specialist tools: frontmatter bind to. Admin spawns no longer write a per-spawn .mcp.json or pass --mcp-config; the per-account env (ACCOUNT_ID, USER_ID, NEO4J_URI, NEO4J_PASSWORD, PLATFORM_ROOT, CLAUDE_CONFIG_DIR) rides the PTY env block.

Public agents are the one exception. A public-facing web agent should boot only the handful of servers it may use, not every installed plugin, so public spawns retain the per-spawn mcp-config.json (--mcp-config <path>) that restricts the server set to plugins exposing at least one publicAllowlist: true tool (minus memory when liveMemory: false). Per-spawn descriptors keep the SHORT prefix mcp__<plugin>__<tool>, which is why the public allowlist (toolSurface.public) stays short while admin/all are long. The descriptor's command routes through lib/mcp-spawn-tee/dist/index.js so each child server's stderr lands in ${LOG_DIR}/mcp-<name>-stderr-<date>.log even on synchronous module-load throws (Task 743 wrapper). --strict-mcp-config only ever guarded auto-discovery of a project .mcp.json; it is retained on the public per-spawn path and dropped from admin spawns that no longer pass --mcp-config.

For subagents, the same spawn pushes --add-dir for every bundled plugin agents directory (platform/plugins/*/agents/, premium-plugins/*/agents/) — both roles — plus the per-account specialists directory <accountDir>/specialists/agents/ (admin only). Claude Code's subagent_type dispatch reads the agent file off disk via the added directories; without --add-dir the dispatcher returns "no matching agent."

A boot gate refuses to start the manager when any admin-allowlisted tool mcp__<plugin>__* lacks a registered server. The signal is boot-failed reason=mcp-allowlist-without-server plugin=<p> tool=<t> followed by process.exit(1). The remediation is a one-line edit to the named PLUGIN.md: add the mcp: block. The complementary observability emit mcp-config-allowlist-coverage admin-tools=A admin-registered=R (where A === R) confirms the invariant per boot.

A second boot gate walks every specialist .md under platform/templates/specialists/agents/, every bundled <plugin>/agents/ directory, and the per-account <accountDir>/specialists/agents/ directory, parses each file's tools: frontmatter line (canonical long-prefix names since Task 502), and classifies every tool name as one of: CC-native (Read, Bash, …), a tool the loaded PLUGIN.md set actually serves (matched as the long canonical name in toolSurface.all), a third-party MCP bridge (a mcp__plugin_* name whose plugin segment is NOT a maxy platform plugin — Playwright etc., upstream-owned, passes unconditionally), unknown-tool-in-plugin (maxy plugin namespace served but tool name absent), unknown-plugin-namespace (namespace served by nothing), brand-excluded-plugin (namespace served by nothing on this brand, but the brand's brand.json#plugins.excluded list names it), or malformed-name (not CC-native and not mcp__-shaped). The first three pass. The next two refuse boot with one boot-failed reason=specialist-tool-drift specialist=<name> tool=<t> drift=<class> path=<…> line per defect, then process.exit(1). A maxy-plugin mcp__plugin_* name is validated against toolSurface.all, so a typo or stale long-prefix tool name still refuses boot rather than passing as a bridge; the build-time check-canonical-tool-names.mjs gate catches the same drift in instruction files before publish. brand-excluded-plugin is a structural pass: it lands in a per-specialist strip-list, the manager continues to boot, and at spawn time pty-spawner removes those tool names from the --agent <name> spawn's --allowed-tools argv. The complementary observability emit specialist-tool-strip specialist=<name> plugin=<p> tools=<csv> reason=brand-excluded fires one line per stripped (specialist, plugin) pair so an operator who reads server.log sees the brand filter doing work without cross-referencing brand.json against the template. The startup-self-test line startup-self-test specialist-tool-drift=ok inspected=<N> stripped-specialists=<M> confirms the gate ran and how many specialists carry strip-lists.

This gate was Task 173. The brand-excluded branch closes the recurring crash-restart loop on brands that ship without a plugin the shared personal-assistant.md template references (e.g. realagent-code excludes telegram while the template hard-codes mcp__telegram__*). The brand-agnostic template stays a single file; the brand-aware filter expresses what the specialist may do on this install while the template expresses what it can do across brands. Tool typos and renamed plugins still refuse to boot — only namespaces explicitly named in plugins.excluded are demoted to strip-and-warn.

Brand-foreign premium bundles (Task 343 / Task 344). Task 344 closes the loop one layer up: the installer bundler at packages/create-maxy-code/scripts/bundle.js now applies the same brand.json#shipsPremiumBundles gate at payload assembly time, so foreign bundles never reach disk on the device. The gate is shared with the test suite via scripts/premium-bundle-gate.mjs and accepts only two shapes — undefined / missing → ships nothing; string[] → ships only the named bundles. The legacy boolean true form is rejected: bundle.js hard-fails with FATAL: brand.shipsPremiumBundles must be a string[] (boolean 'true' no longer accepted; enumerate bundles in <brand.json>). An allowlist entry naming a bundle directory that is absent on disk is also FATAL — silent over-shipping is the failure mode this gate exists to prevent. Each build emits one [bundler] premium-bundle-gate brand=<n> mode=<m> shipped=[…] skipped=[…] line. The runtime gate walkPremiumBundles at plugin-manifest.ts keeps the same shape and stays as defence-in-depth — on a correctly bundled payload, it walks only allowlisted bundles because foreign ones are not present. The drift-gate's agents-dir-skipped reason=brand-foreign-bundle line therefore fires only when something has staged a foreign bundle out-of-band.

Structured journald mirror for boot-failed (Task 343). Every boot-failed reason=specialist-tool-drift … line is mirrored to journald via systemd-cat -t maxy-csm -p err with the fields specialist=, tool=, drift_reason=, agent_path= so journalctl --user -u <brand>-claude-session-manager.service -t maxy-csm can filter by any of them without grep on server.log. The stdout line stays unchanged so the existing diagnostic one-liners keep working. systemd-cat absence (e.g. macOS dev box) is swallowed — the stdout line is the primary surface; the structured emit is auxiliary.

Per-spawn signals (server.log). Every spawn emits pty-spawn-mcp-config servers=<N> tools=<M> bytes=<B> path=<…> once, plus one pty-spawn-agents-dir role=<admin|public> path=<…> per added directory. Specialist spawns additionally emit pty-spawn-allowlist specialist=<name> count=<N> stripped=<S> sourced-from=agent-frontmatter where stripped is the count of brand-excluded tool names removed before argv emission. The diagnostic one-liner is grep -E 'pty-spawn-mcp-config|pty-spawn-agents-dir|pty-spawn-allowlist|mcp-config-allowlist-coverage|specialist-tool-strip|boot-failed reason=' ~/.<brand>/logs/server.log | tail -50.

Brand-process start counter (Task 173). platform/ui/server-init.cjs increments a persistent counter at /tmp/server-init-<accountId>-restart.count on every fresh start and emits [server-init] start count=<N> account=<accountId> counter-path=<…> to server.log. /tmp clears on reboot, so a clean reboot starts the count fresh; any value >1 between operator-observed reboots means the brand process (driven by its Requires=<brand>-claude-session-manager.service clause) is restarting. The diagnostic one-liner is grep '\[server-init\] start' ~/.<brand>/logs/server.log | tail -5 — the trailing count= value is the loop depth without counting SIGTERMs.

Programmatic spawn entry point. Every admin PTY spawn that needs a first user prompt — UI click, turn-recorder hook, future automation — routes through the single wrapper at platform/ui/server/routes/admin/claude-sessions.ts. The wrapper owns the per-spawn enrichment (owner profile, dormant/active plugins, specialist domains, tunnel URL) and the senderId resolution; it forwards a single POST /spawn to the session manager on 127.0.0.1, with initialMessage inlined on that body. The manager appends initialMessage as the trailing positional argv to claude, so the CLI processes it as the session's first user turn at PTY startup — no separate POST /<sessionId>/input call, no bracketed-paste. (Task 153.) See admin-session.md "Spawn-with-initialMessage wrapper" for the body schema and caller list.

Recorder auto-archive (lifecycle, not user-initiated). The session manager's attachRecorderAutoArchive (platform/services/claude-session-manager/src/http-server.ts:178) wires every spawn whose senderId === 'turn-recorder' to a JSONL watcher: as soon as the recorder's JSONL contains "stop_reason":"end_turn", the manager calls stopSession, the PTY exits, the PID file is removed, and fs-watcher.ts:275-297 demotes the row to state: 'archived'. This is the lifecycle archive path — the row stays in place, the JSONL stays on disk, no directory move. It is structurally distinct from the user-initiated POST /api/admin/claude-sessions/:id/archive route, which actually mvs the JSONL between <slugDir> and <slugDir>/archive/; that path is the operator pruning their visible session list, not the recorder's per-turn cleanup.

Tool Call Audit Trail

Every tool invocation by the admin agent produces a durable ToolCall node in the knowledge graph, linked to the Conversation that triggered it. This covers all admin agent tool calls — the full history of what the agent did, when, and in what context.

Each ToolCall record contains:

FieldDescription
toolNameThe MCP tool that was invoked (e.g. memory-search, workflow-execute)
pluginNameThe plugin that owns the tool
inputTruncated JSON of the tool's input arguments
outputTruncated response text
isErrorWhether the tool call resulted in an error
startedAt / completedAtTimestamps for the invocation
sessionIdLinks back to the originating conversation

Records persist indefinitely and are queryable by the admin agent. Ask Maxy "what tools ran in the last session?" or "show me all tool calls from today" to review the audit trail.

Workflow-dispatched tool calls are tracked separately via StepResult nodes (part of the workflow execution system) and are not duplicated as ToolCall nodes.

Diagnostics

Tool call persistence logs to server.log with the [persist] tag:

grep '[persist] tool-call persisted' server.log | tail -10

Each log entry includes the tool name and a truncated conversation ID for correlation.

Process provenance — durable actions emit Tasks

Every durable action — cloudflare tunnel-login, brand publish, future deterministic flows — emits a :Task {kind:"<flow>"} node carrying the action's lifecycle and a :PRODUCED edge to every entity the action created. This makes the graph traversable from the originating Conversation to every entity created during it via (c)<-[:RAISED_DURING]-(t:Task)-[:PRODUCED]->(e) — answering "what did this turn produce" in one Cypher hop.

The doctrine is observed at the storage primitive: writes to :Person, :UserProfile, :AdminUser, :Organization, :LocalBusiness, :CloudflareTunnel, or :CloudflareHostname should carry an inbound :PRODUCED edge whose source is one of :Task, :Conversation, or :Message. Subtype labels like :AdminConversation, :UserMessage, :AssistantMessage, :AdminMessage qualify because the gate checks the full labels() array. Bootstrap writes (PIN-setup, schema migrations, lazy first-session UserProfile creation) are exempt via createdBy.agent === 'system'. When no qualifying edge resolves, the primitive emits a [graph-write] warn reason=missing-provenance labels=<csv> agent=<agentLabel> line and the write proceeds (Task 580 relaxed this from a hard reject — the composer-spawned admin path inherits a bare per-account env that never receives the SESSION_NODE_ID stamp, so the throw was failing every direct admin contact-create / memory-write for a gated label).

Two surfaces emit the lifecycle: agent-driven actions call work-create/work-update/work-complete over MCP (work-create accepts kind, the canonical inputsProvided call-shape record, inputs + inputSchema for the operator-meaningful form payload, and raisedDuringConversationKey to resolve the RAISED_DURING edge). Shell-driven actions wrap their script invocation in platform/ui/app/lib/cloudflare-task-tracker.ts (cloudflare is the first; installer / brand-publish / OAuth-login deferred). Both surfaces emit the same [task] action-start|step|done log lines so operators can grep one channel uniformly. Both also call the central redactSecrets primitive (platform/lib/task-secrets/) to strip schema-tagged secret keys before persisting inputs.<field> props on the Task — see .docs/neo4j.md § Audit Task input contract for the contract that replaces per-kind allow-lists.

Two surfaces feed the gate. (1) Workflow path: memory-write accepts an optional producedByTaskId parameter. When set, an inbound :PRODUCED edge from that Task is composed into the write's relationships before the gate runs — the typical agent-side pattern is to call work-create at the start of an autonomous flow, capture taskId, and pass it as producedByTaskId on every subsequent memory-write for a gated label. The gate verifies Task and write share the same accountId; mismatch is rejected loud. (2) Direct-ask path: the admin server resolves the active :AdminConversation's sessionId UUID and stamps it as SESSION_NODE_ID in the spawn env at PTY-spawn time. The same stamp propagates onto specialist subagent spawns the admin dispatches (Task 382) so listing-curator, content-producer, database-operator etc. inherit the same conversation anchor. The contact-create and memory-write wrappers call injectConversationProvenance (exported from @maxy/graph-write) which MATCHes (c:Conversation {sessionId, accountId}) — account isolation is part of the natural key, not a separate gate — and prepends the synthetic :PRODUCED edge (composed by Neo4j elementId, which the helper reads off the MATCH). No agent-visible schema field changes. memory-write uses the env-stamp only as a fallback when producedByTaskId is unset; contact-create has no producedByTaskId parameter today and relies on the env-stamp alone. Autonomous (cron-driven) specialists with no parent conversation legitimately have no env-stamp; those must thread producedByTaskId.

Operator audit cyphers:

  • "What entities did this conversation's actions produce?" — MATCH (c:AdminConversation {sessionId:$id})<-[:RAISED_DURING]-(t:Task)-[:PRODUCED]->(e) RETURN labels(e), e.name, t.kind, t.status
  • "What cloudflare resources did this tunnel-login produce?" — MATCH (t:Task {kind:'cloudflare-tunnel-login', status:'completed'})-[:PRODUCED]->(r) RETURN t.taskId, r.tunnelId, r.hostnameValue ORDER BY t.completedAt DESC

See .docs/neo4j.md § Process provenance doctrine for the full enforcement contract, observability surface, and out-of-scope deferrals.

Context compaction

When an admin turn crosses 75% of the model's context window, Maxy runs a silent compaction turn that asks the agent to call the session-compact MCP tool with a structured briefing (what you asked for, what was done, decisions made, work-in-progress, things you've shared about yourself). The briefing is written to Neo4j; the next admin turn injects it back into the system prompt, so continuity survives across the compaction boundary without re-sending the full transcript.

The compaction runs against a transient one-shot pool entry separate from the long-lived admin Query. Operator-visible side effects:

  • Compaction logs land in claude-agent-compaction-stream-YYYY-MM-DD.log alongside the main stream log. Look for [compaction-start], [compaction-summary-captured], [compaction-failed], [compaction-timeout], [compaction-crashed], or [compaction-spawn-error] to triage. Subprocess stderr is captured inline as [subproc-stderr] <line> — there is no longer a separate claude-agent-compaction-stderr-…log file.
  • The one-shot pool entry's lifecycle is greppable as [client-cold-create] reason=compaction-one-shot … paired with [client-evict] reason=compaction-one-shot …, distinguishable from the regular admin pool's lifecycle tags.