Troubleshooting

Stream-log file for a fresh session is absent or empty

Symptom: Operator opens a new admin session, sends one turn, sees the agent reply, then logs-read sessionKey=<…> returns file-not-found or zero bytes.

Invariant: For every new session, the stream-log file exists on disk iff at least one token byte has been emitted, and contains the token bytes from the moment the first token returns to the operator. The single-writer mandate (2026-05-14) mechanically enforces both halves of the contract: the single writer module at platform/ui/app/lib/claude-agent/stream-log-writer.ts opens the file lazily on streamLog.writeToken (the SDK first-byte site at stream-parser.ts:296), and the build gate platform/ui/scripts/check-stream-log-writer.mjs rejects every external appendFileSync/createWriteStream against the claude-agent-stream-* pattern at CI time. The first-token invariant is bound by platform/scripts/__tests__/first-token-creates-stream-log.test.sh: one operator turn, one token, claude-agent-stream-<sessionKey>.log exists and contains the token bytes — pass iff file present and bytes present. The hourly adherence runner platform/scripts/log-adherence-check.sh extends the device-side check with a duplicate-basename diagnostic (dup-basenames=N in the [log-tee] adherence-check line); dup>0 is a P0 page meaning the writer collapse regressed.

Diagnose if it ever recurs: run bash platform/scripts/__tests__/first-token-creates-stream-log.test.sh from the install. Pass = invariant holds; any other exit = the writer-side existence contract is broken and one [log-tee] missing-on-resolve sessionKey=<8> surface=<…> line on server.log is the operator-visible signal (P0). For the duplicate-file class specifically (the 2026-05-14 recurrence trigger), bash platform/scripts/log-adherence-check.sh returns non-zero whenever any sessionKey has more than one claude-agent-stream-<sk>.log across account dirs.

Retrieving evidence from an rc-spawn session

rc-spawn sessions (those started via the sidebar or the claude rc --spawn daemon) do not write a per-account stream log under data/accounts/<id>/logs/. Their evidence is the Claude Code JSONL transcript in the configDir:

<CLAUDE_CONFIG_DIR>/projects/<slug>/<uuid>.jsonl                      # parent session
<…>/projects/<slug>/<uuid>.meta.json                                  # bridgeIds persistent map
<…>/projects/<slug>/<uuid>/subagents/agent-<hex>.jsonl               # each subagent
<…>/projects/<slug>/<uuid>/subagents/agent-<hex>.meta.json           # {"agentType",…}

Retrieve a session's merged timeline: logs-read.sh <key> with a bare key (no second argument) maps the key to the local <uuid> and prints one timestamp-ordered timeline merging the parent transcript with every subagent transcript. The key is resolved in order: a matching <uuid>.jsonl on disk; a sessions/<pid>.json whose bridgeSessionId matches; a <uuid>.meta.json whose bridgeIds carries the suffix (persistent — survives PID-file cleanup on clean exit); and finally a content scan of the top-level transcripts as last resort. Any accepted key form works: the claude.ai session_<id>, its bare suffix, or the <uuid> (or a unique uuid prefix).

Every subagent is_error tool_result is flagged inline as ‼ SUBAGENT ERROR with the agent type, the failing tool, and the error text. The parent session's own tool errors appear as ‼ tool error. The two are never conflated.

Audit all silently-failed subagents: logs-read.sh --scan-subagent-errors [N] walks every subagents/agent-*.jsonl under the configDir and lists each one carrying an is_error result — agent type, parent session, failing tool, error text. Optional N limits the scan to the N most-recently-modified transcripts. Use this when a delivery failure was reported but no reproduction is available.

Quick recipes:

# A session's merged parent+subagent timeline (subagent errors flagged inline)
~/maxy-code/platform/scripts/logs-read.sh session_<id>

# Standing audit: every subagent transcript that failed silently
~/maxy-code/platform/scripts/logs-read.sh --scan-subagent-errors

# Limit audit to the 50 most-recent transcripts
~/maxy-code/platform/scripts/logs-read.sh --scan-subagent-errors 50

Note: passing an explicit second argument (e.g. logs-read.sh <key> agent-stream) still reads the legacy per-account stream log — the bare-key JSONL path is the default when no type is given.

A JavaScript-rendered page comes back empty from WebFetch or `url-get`

Symptom: A page that needs JavaScript to show its content returns empty or a shell document from WebFetch (summary) or url-get (verbatim, server-rendered).

Resolution: Use the browser core plugin's browser-render tool. It renders the page in the device's per-brand Chromium over the Chrome DevTools Protocol (the same browser the VNC viewer shows) and returns the rendered HTML plus visible text. It attaches to the already-running Chromium on 127.0.0.1:${CDP_PORT} — nothing is downloaded or installed mid-session.

Diagnose if it ever recurs: grep the per-conversation stream log for [browser-render]. rendered=true domBytes=<n> is the healthy signal. rendered=false outcome=cdp-unreachable means no Chromium is listening on the brand's CDP port — confirm with curl 127.0.0.1:<cdpPort>/json/version. Other outcomes (navigate-failed, load-timeout, evaluate-failed) name the failed CDP step.

First user-domain write rejected by `[graph-write-gate] reject reason=no-admin-user`

Symptom: Admin chat reports "couldn't save that — set up your business profile first" or [graph-write-gate] reject reason=no-admin-user appears in server.log on the operator's first non-bootstrap write (a website, service, opening hours, etc.). Reproduces on Minimal-onboarded installs from before the seed-stamping fix shipped.

Diagnose: Tail the gate reject and self-heal lines together:

grep -E "adminuser-self-heal|graph-write-gate.*reject" <server.log>

[adminuser-self-heal] healed=1 … followed by no [graph-write-gate] reject lines on subsequent writes — heal fired, the gate is now passing. Operator can retry.
[adminuser-self-heal] healed=0 … + [graph-write-gate] reject … subReason=admin-user-no-accountid — heal couldn't reach the broken node. Most likely cause: the env-side ACCOUNT_ID doesn't match any :AdminUser.userId. Cross-check users.json[0].userId against MATCH (au:AdminUser) RETURN au.userId, au.accountId — if the userId mismatches, the post-Task-904 [admin-invariant] line in the same log will show direction=users-without-account and the repair is to align the stores per .docs/agents.md § "Three-store admin auth invariant", not to retry the heal.
[graph-write-gate] reject … subReason=no-admin-user-node — the graph has no :AdminUser at all. Re-run the seed (platform/scripts/seed-neo4j.sh) under the install's env vars; the boot self-heal won't help because there's nothing to heal.

The subReason=admin-user-no-accountid path should be impossible on any install whose admin server has booted at least once after the boot self-heal shipped — if it fires, the diagnostic recipe is the cross-check above, not "rerun the heal."

Fresh install opens to "Set your remote password" on the LAN URL

Symptom: On a brand-new device, the LAN URL printed by create-maxy (e.g. http://maxy.local:19200) opens to a remote-password setup page instead of admin onboarding. This was a Task-647-era regression and should not occur on any install built.

Diagnose: On the Pi, grep the UI server log for the gate's disambiguation fields:

tail -200 ~/.maxy/logs/maxy-ui.log | rg '\[remote-auth\].*resolvedKind='

resolvedKind=lan on a login required or not configured line means the classifier sees the request as local — if the browser is still on the remote-auth page, something cached the older page before the fix shipped (hard-refresh the tab).
resolvedKind=external means the request chain presents as remote (routable IP in the first x-forwarded-for hop). On a LAN-only browser this points to a proxy or VPN rewriting headers between the browser and the Pi.
resolvedKind=unknown is a defect — the classifier could not identify the TCP peer. Capture the log line and file it; do not work around it.

Fix: If all three fields confirm the LAN shape and the gate still refuses, upgrade the platform (Software Update from admin chat) to pick up the Task-679 classifier.

Symptom: Posting the remote-auth password returns a plain-text 400 Remote access requires TLS response instead of completing sign-in.

What this means: The login endpoint will only issue a session cookie when the request arrived over HTTPS (via the Cloudflare tunnel). Browsers silently drop Set-Cookie: Secure on plain-HTTP responses, so minting a cookie there would produce a dead-end redirect. An earlier fix replaced that silent failure with this loud one.

Fix: Reach the admin surface through the tunnel hostname (e.g. https://admin.<your-domain>), not an IP or plain-HTTP URL. If you need LAN access, use the LAN URL (http://<hostname>.local:<port>) — LAN never hits the remote-auth endpoint.

Agent Not Responding

Symptom: You send a message and nothing comes back, or the response never arrives.

Check:

Ask Maxy: "Check system status" — the system-status tool will report whether all services are running
Check the platform logs: ask Maxy "Show me the recent logs"
If the admin agent itself won't start: restart the platform (see below)

Common causes:

Claude API connectivity issue — check your Claude OAuth connection is still valid
Platform process has stopped — restart it
Network issue if accessing remotely — check your Cloudflare tunnel is running

If the chat shows a single [agent-loop-stop] same error twice — aborting line and stops: Maxy hit the same structured tool failure twice in a row inside one turn (e.g. a permission gate refused the same write twice, or two Read calls hit the same missing file). The runtime aborted the turn after the second occurrence to save tokens instead of running until the SDK turn budget exhausted. The blocker text names the tool and the first line of the error. Resolve the underlying cause (re-run the named skill, fix the missing prerequisite, etc.) and tap "Continue" — the next turn truly resumes the prior SDK session via the synthetic-tool-result contract, so Maxy picks up where it aborted instead of cold-querying its own session list. To see the diagnostic, ask Maxy: "Show me the most recent stall-recovery log line." Greppable post-deploy invariants: [agent-loop-stop] reason=identical-tool-failure tool=<name> errorSignature=<sha8> toolInputDigest=<sha8> followed by [stall-recovery] kind=agent_loop_stop … handoff=resume-first and on the next turn [stall-resume] consumed kind=agent_loop_stop toolUseId=<8> priorSessionId=<8>. The fallback path (when the SDK session id was lost) emits handoff=metadata-only + [recovery-handoff] generated/consumed reason=agent-loop-stop and the chat button reads "Start over" instead of "Continue". A [recovery-handoff] WARN missing-on-cold-create line means the fallback briefing wasn't persisted — surface to support.

If a background task goes silent and the chat shows "A background task went silent — K of M completed": Maxy's subagent stopped emitting progress for over 2 minutes. Tap "Continue" — the next turn resumes the prior session and reads a synthetic tool_result describing what completed before the pause, so the agent re-plans without losing the work it had done. Most stalls are upstream API latency rather than the subagent's approach failing — the resume-first path treats both correctly. Greppable post-deploy invariants: [stall-recovery] kind=subagent_stalled … completed=<K>/? handoff=resume-first followed by [stall-resume] consumed kind=subagent_stalled toolUseId=<8> on the next turn. If the button reads "Start over" instead, the parent's pending tool_use_id was not captured — the fallback path took over; the prior conversation is preserved as a <recovery-context> block in the cold-started session.

Agent searches the filesystem after uploading a zip. If you uploaded a zip and the agent burns several turns running find / Glob instead of unzipping, that is the symptom of the recovery-retry attachment-context regression (now closed by the recovery context preservation contract in .docs/agents.md). Greppable confirmation is the [context-overflow-recovery] retry … attachmentsCarried=<n> line in the conversation stream log. If you see [context-overflow-recovery] WARN attachment-context-lost, the regression has returned — surface to support.

Turn budget exhausted with a horizontal rule separating two assistant turns. When Maxy reaches its turn budget and the doubled retry also runs out, the chat now shows a one-paragraph assistant message that opens with error_max_turns turns=A→B (initial budget → final budget) followed by the recovery copy: "I reached my turn budget of N before I could finish this request. Try sending a smaller or more focused request, or ask me to use higher effort." That message is persisted to the graph, so the next page-refresh still shows it. The thin horizontal rule labelled "Session restored after timeout." that appears above your following turn signals that the prior turn forced a cold SDK-session restart inside the same conversation (pool eviction) — the agent's response after the rule is from a fresh SDK session even though the conversation thread is unchanged. Greppable post-deploy invariants: [context-overflow-recovery] exhausted cause=max-turns-interrupted count equals [admin-persist] writer=persistMessageExhaust outcome=ok count for the same sessionId window, and one [session-store] storeAgentSessionId line marks the cold-restart that drove the on-screen rule.

A turn rendered in chat is missing on next page-refresh. Pre-the 2026-05-07 mandate this was a class of silent failure — Neo4j persists were wrapped in a no-op error catch and a write that threw left the artefact "rendered then disappeared on resume". The 2026-05-07 mandate makes JSONL canonical: the resume route reads the SDK transcript file at ~/.claude/projects/<project-key>/<sessionId>.jsonl first, supplements from Neo4j, and triggers async heal-on-resume writes for any turn the JSONL has but Neo4j does not. So a refreshed conversation always renders what the SDK saw, regardless of write outcome. If a heal write itself fails, the chat shows a top-of-conversation banner naming the count; if every heal succeeds the resume is silent and the missing rows are quietly restored to Neo4j. Greppable post-deploy invariants in the per-session stream log (logs/claude-agent-stream-<sessionKey>.log): [admin-resume] reason=<…> source=<jsonl|jsonl-missing|neo4j-only> (one per resume), [admin-persist] convId=<8> writer=<…> outcome=<ok|fail|skip> (per persist site), [admin-persist-heal] convId=<8> turnIndex=<n> outcome=<ok|fail> (per heal write). To force-audit a specific conversation against its Neo4j projection without re-executing it, run tsx platform/scripts/admin-persist-audit.ts --conversation-id=<uuid> --account-id=<uuid> --session-id=<uuid> — non-zero exit + per-divergence [admin-persist-audit] expected=<message|component> missing reason=neo4j-row-absent lines name what would have been silently lost pre-mandate. Wrong Claude account answering on a multi-brand device. On a host running both Maxy and Real Agent, each brand's admin agent reads its own ~/${brand.configDir}/.claude/.credentials.json; there is no longer a shared ~/.claude/ thrashing them against one another. If a brand reports auth failures or appears to be operating against the wrong subscription, check three things:

grep "\[claude-auth\] init" ~/.${brand}/logs/server.log | tail -1 — the resolved path must end with ~/.${brand}/.claude/.credentials.json. If a [claude-auth] WARN cross-brand-path-detected line is present, the runtime is still pointing at ~/.claude/; the brand main service did not pick up the Environment=CLAUDE_CONFIG_DIR= setting (re-run the brand installer to refresh the unit file).
diff <(jq .claudeAiOauth.accessToken ~/.maxy/.claude/.credentials.json) <(jq .claudeAiOauth.accessToken ~/.realagent/.claude/.credentials.json) — must be non-empty after each brand's operator has run claude /login against distinct Anthropic accounts; if it's empty, both brands are still logged in to the same account (operator action, not a code bug).
grep "\[install\] claude-creds pickup" ~/.${brand}/logs/install-*.log — fires once on the first post-Task-923 install of any brand and moves the legacy ~/.claude/.credentials.json into that brand's path. Subsequent brands install with no credentials and require a fresh claude /login inside that brand's chat (which writes to the brand-scoped path because the systemd unit env is in scope).

All sessions on the brand stopped responding after a token expiry. Symptom on the operator side: every spawn dies at pid-file-timeout and the dashboard health probe reports auth dead. Diagnose the OAuth refresh path before anything else:

tail -n 300 ~/.${brand}/logs/server.log | grep -E 'auth-refresh|auth-health|invalid_grant' — op=lock-acquired proves the cross-process lock is in play (Task 576). op=skipped-fresh means a sibling process (the admin server or a claude binary) already rotated the tokens during the lock wait — expected, healthy. op=renewed expiresAt=… is the only line that means a network refresh actually ran.
outcome=fail-token or invalid_grant lines mean Anthropic rejected the refresh token itself (revoked or expired beyond the rotation window). The brand needs a fresh claude /login. Pre-576 the most common cause was the admin server and a spawned claude racing to rotate the same single-use refresh token; that race is now serialised by the file lock at ~/.${brand}/.claude/.credentials.json.lock and a re-read after the lock skips redundant refreshes.
grep '\[auth-health\]' ~/.${brand}/logs/server.log | tail -n 5 — the heartbeat fires every five minutes. status=dead expiresIn=... means the refresh token is gone; only a re-login fixes it. status=ok heartbeats with no spawns in between mean the credentials file is healthy and the failure lives elsewhere.
The spawn-failure surface now carries reason=auth-refresh-failed (with authStatus in the JSON body) instead of generic pid-file-timeout whenever the credentials file is in dead or expired state at the moment of failure — visible in grep '\[spawn-failed\]' on server.log.

Memory Not Working

Symptom: Maxy doesn't remember things you've told it, or search returns nothing.

Check:

Ask Maxy: "Check the Neo4j connection"
Ask Maxy: "Search memory for [something you know was stored]"

Common causes:

Neo4j service stopped — restart the platform, which restarts Neo4j
Memory index is stale — ask Maxy: "Reindex memory"

Telegram Bot Not Receiving Messages

Symptom: You send a message to the bot and nothing happens.

Check:

Confirm the bot token is correct: ask Maxy "What Telegram bot token is configured?"
Verify the bot is running: send /start to the bot in Telegram
Check the MCP server logs: ask Maxy "Show Telegram plugin logs"

Common causes:

Bot token changed (if you regenerated it in BotFather) — update it by telling Maxy "Update my Telegram bot token"
Webhook not connected — restart the platform

Plugin Errors

Symptom: A tool fails with an error, or a plugin says it can't connect.

Check:

Ask Maxy: "Show me recent errors"
Ask Maxy: "Restart the [plugin name] plugin"

Common causes:

Missing environment variable (API key, token) — the error message will name it; ask Maxy to help configure it
MCP server crashed — restarting the platform restarts all MCP servers

Symptom: Mounting smb://<hostname>.local (or \\<hostname>.local\<brand>) fails with a "logon failure" or the share does not appear in your network browser.

Check:

Confirm you have set a PIN in the admin UI at least once. On a fresh Pi or Hetzner box the smbpasswd entry does not exist until the first set-pin runs — mounts before that point always fail.
Use the install owner as the username (admin on a Pi or Hetzner box; the Linux user that ran the installer on a self-hosted laptop) and the current Maxy PIN as the password. The SMB password is not stored separately — it is the PIN.
If <hostname>.local does not resolve from your client, mount by LAN IP instead (smb://192.168.1.50 on macOS, \\192.168.1.50\<brand> on Windows).
Rotate the PIN in the admin UI. That re-triggers the smbpasswd sync on the device. If the resync log line reads [set-pin] smbpasswd sync failed owner=<unknown> rc=-1 reason=install-owner-file-missing, restore ~/.<brand>/.install-owner from the installer log.

See Samba Share for the full credential model and per-OS mount syntax.

Restarting the Platform

From the admin interface, ask Maxy: "Restart the platform."

If Maxy itself isn't responding (the page loads but the agent won't connect), try refreshing the browser. If the page itself won't load, the platform process may have stopped — power-cycle the Raspberry Pi by unplugging and reconnecting power, then wait a minute for services to restart automatically.

Checking Logs

Ask Maxy: "Show me the logs" or "Show errors from the last hour."

For specific plugin logs: "Show Telegram logs" or "Show contacts plugin logs."

Maxy has access to all platform logs and can filter them for you.

Cloudflare Tunnel Down (Remote Access Broken)

Symptom: You can reach Maxy on your local network but not via your public domain.

Check: Ask Maxy "Check the Cloudflare tunnel status."

Fix: Ask Maxy "Restart the Cloudflare tunnel."

If the tunnel won't reconnect, re-run the Cloudflare setup: ask Maxy "Reconnect Cloudflare."

If the initial Cloudflare login fails during setup, Maxy will fall back to asking you for a connection key. You can create one in the Cloudflare dashboard (Maxy will guide you through this in the browser).

If you switched Cloudflare accounts or are stuck on the wrong one: ask Maxy "Reset my Cloudflare login and start over." This is a clean reset — Maxy clears every stored credential, then opens a fresh browser sign-in. The next sign-in binds to whichever Cloudflare account you choose, with no risk of the previous account's stored credentials silently coming back.

"Bad Gateway" or holding page during an upgrade

maxy-edge.service (always-on front door) classifies upstream errors and serves a brand-aware response. There are two distinct user-visible shapes; the right one depends on what failed.

Branded holding page (brand logo + "Starting") for ~10 s during an upgrade — this is expected and self-healing. The edge process binds the public port immediately, but maxy.service (the upstream UI) takes ~10 s after restart to apply the neo4j schema and mount its 11 routes. Any browser navigation that lands during that window gets a self-contained HTML holding page that polls /api/health and reloads automatically once the upstream binds. The page renders the brand logo (inlined as a base64 data URI at edge boot from <install>/server/public/brand/<assets.logo>) and the brand display/body fonts (loaded from fonts.googleapis.com) — both paths bypass the unavailable upstream so the page never makes a same-origin asset fetch. When brand.logoContainsName is true the logo replaces the productName text; otherwise the page falls back to "Maxy is starting". No operator action required. The diagnostic line in ~/.maxy/logs/edge.log is [edge] upstream http error path=… err=connect ECONNREFUSED 127.0.0.1:<UPSTREAM_PORT> err-class=econnrefused-coldstart upstream=… and disappears as soon as upstream binds. Boot-time confirmation that the logo resolved: [edge] brand=<name> holding-logo=inlined assets-dir=<path> — holding-logo=missing means the logo file wasn't found at assets-dir, the page degrades to text-only.

Branded plain-text 502 ("Bad Gateway (Maxy unavailable)") — real upstream failure, not cold-start. Any error class other than ECONNREFUSED (timeouts, resets, host-unreachable) returns the existing 502 path. The diagnostic line carries err-class=other. Read the log with tail -200 ~/.maxy/logs/edge.log | rg 'err-class=other' and check ~/.maxy/logs/server.log for upstream stack traces — the upstream itself is the source.

Continuous err-class=econnrefused-coldstart for >30 s past the last [edge] listening line indicates the upstream never binds — the upgrade or boot has stalled. Recover via sudo systemctl --user status maxy.service and check the action runner log per the next section. Permanent-failure UI escalation (turning the holding page into an error after N seconds) is intentionally deferred.

The literal string maxy-ui should never appear in edge.log or in any user-visible 502 body, regardless of brand. If it does, the edge is running stale code — re-bundle and re-publish.

Verifying the holding page locally: curl -sS -H 'Accept: text/html' http://127.0.0.1:<EDGE_PORT>/ while maxy.service is stopped should return HTML containing the brand productName. The Accept: text/html header is required — non-html clients (default curl, fetch, XHR) get the branded plain-text 502 instead, so the holding page's own /api/health polls don't break themselves during cold-start.

Software update and Cloudflare setup

Both flows run on the native Claude Code PTY surface in admin chat (Task 287). The retired action-runner / terminal-modal troubleshooting sections that lived here have been removed because those surfaces no longer exist; failures now manifest as plain stderr from the agent-invoked Bash command, visible in chat.

Software update. Re-run npx -y @rubytech/create-<brand>@latest from a shell; if the installer fails, its stdout is the diagnostic record. HeaderMenu turns sage when installed === latest.
Cloudflare setup. The agent invokes cloudflared directly via Bash, following the cloudflare plugin's plugins/cloudflare/references/manual-setup.md. Failures surface as cloudflared's literal stderr plus a non-zero exit. Recovery paths live in plugins/cloudflare/references/reset-guide.md and plugins/cloudflare/references/manual-setup.md.

Orphan Account Directory Archived to `.trash/`

What happened: During upgrade, the installer detected multiple account directories under ~/maxy/data/accounts/ and identified one as live (its admins list matches the device's users.json). Non-matching siblings are archived — not deleted — under ~/maxy/data/accounts/.trash/<uuid>-<ISO8601-ts>/.

Installer signal: Look for these lines in the installer log or admin terminal output:

==> [seed] identity-match: kept=<uuid-short> via userId=<first-8>
==> [seed] swept orphan: <uuid-short> →.trash/<uuid-short>-<ts>
==> [seed] orphan sweep: moved N → ~/maxy/data/accounts/.trash/

Rollback (if the wrong account was kept): The archive is preserved verbatim. Stop the platform, move the desired directory back, restart:

sudo systemctl --user stop maxy-ui
mv ~/maxy/data/accounts/<live-uuid> ~/maxy/data/accounts/.trash/<live-uuid>-$(date -u +%Y%m%dT%H%M%SZ)
mv ~/maxy/data/accounts/.trash/<archived-uuid>-<ts> ~/maxy/data/accounts/<archived-uuid>
sudo systemctl --user start maxy-ui

.trash/ retention: Archived directories are kept indefinitely. The platform never auto-empties .trash/. When you're confident the archived orphans are truly obsolete, remove the directory manually: rm -rf ~/maxy/data/accounts/.trash/<uuid>-<ts>/.

Installer aborted with "identity-match FAILED": Multi-account installs where no sibling matches users.json[0].userId abort loud — the installer refuses to pick one and refuses to sweep. Resolution: inspect account.json in each candidate dir (listed in the abort output), identify the correct owner, move the other(s) aside manually, then re-run the installer.

A chat turn looks broken — assistant bubble never rendered: Open claude-agent-stream-<sessionKey>.log and grep for [sse-client]. The five phases (connected, event_received, render_complete, error, close) tell the story in order. Missing connected = the chat fetch never returned 200; missing event_received = the server emitted nothing or the client lost the stream before the first frame; missing render_complete = the reducer never committed the assistant bubble (persist_ack never arrived).

Admin DevTools console floods with `onboarding-banner-mount` or `sessions-poll` lines

Regression symptom. Open DevTools on the admin shell at / with onboardingComplete=false, leave the page idle for a minute, then scroll back through the console. Thousands of [admin-ui] onboarding-banner-mount onboardingComplete=false lines (one per AdminShell render, ~40/min driven by the 3s sessions poll) with no per-tick poll telemetry indicates the banner-mount log has regressed back into the render body.

Steady-state invariants at /:

grep -c '\[admin-ui\] onboarding-banner-mount' ~/.maxy/logs/admin-ui-console.log equals page-load count plus onboarding-flip count, not the render count. Sustained climb at idle means the banner mount log regressed back into the render body (fix).
grep -c '\[admin-ui\] sessions-poll' ~/.maxy/logs/admin-ui-console.log over a 60-minute idle window equals zero. The hook no longer installs a setInterval; every sessions-poll line is operator-triggered (initial mount, refresh button, post-mutation refetch). One or more lines during operator idle means setInterval was reinstated.
outcome=error lines name a real fetch failure on an operator-triggered refetch, set the error field, and surface in the sidebar.

Reconcile signal:

grep -c '\[admin-ui\] sidebar-meta-pane-reconcile' ~/.maxy/logs/admin-ui-console.log should equal the count of End / Resume / Purge clicks while the metadata pane was open. A to=gone line without a paired Close click means the pane's auto-close logic regressed.

Why this matters. The render-body log was misleading: it read as "the admin agent is checking onboarding state continuously", when in fact onboardingComplete had not changed at all. The fix moved the log into useEffect(…, []) then dropped the per-tick poll entirely, so a quiet console is now the steady state. With both fixes in place, console output is a faithful record of what the page actually did each operator click.

Troubleshooting

Stream-log file for a fresh session is absent or empty

Retrieving evidence from an rc-spawn session

A JavaScript-rendered page comes back empty from WebFetch or url-get

First user-domain write rejected by [graph-write-gate] reject reason=no-admin-user

Fresh install opens to "Set your remote password" on the LAN URL

Remote sign-in is rejected with "Remote access requires TLS"

Agent Not Responding

Memory Not Working

Telegram Bot Not Receiving Messages

Plugin Errors

Cannot Mount the SMB Share

Restarting the Platform

Checking Logs

Cloudflare Tunnel Down (Remote Access Broken)

"Bad Gateway" or holding page during an upgrade

Software update and Cloudflare setup

Orphan Account Directory Archived to .trash/

Admin DevTools console floods with onboarding-banner-mount or sessions-poll lines

A JavaScript-rendered page comes back empty from WebFetch or `url-get`

First user-domain write rejected by `[graph-write-gate] reject reason=no-admin-user`

Orphan Account Directory Archived to `.trash/`

Admin DevTools console floods with `onboarding-banner-mount` or `sessions-poll` lines