Marlin

Finding Memo

AI agents are simpler than you think. Three concepts, one metaphor, and a live reconstruction.

scroll down
The basics

The Cast

clownfish

The Model

Text in, text out. Stateless. Smart but alone.

Marlin's brain

Statelessness in practice: every API call starts fresh. The model has zero memory of your previous conversation unless you resend it. "Memory" is an illusion created by the harness replaying history.

Under the hood: models are transformer neural networks. Key parameters: temperature (randomness), top-p (sampling), max tokens (output length). Token economics: input tokens are cheaper than output tokens. Context window size determines how much the model can "see" at once.

What it does: takes a prompt, produces a completion. One call, no memory of previous calls.

Examples: Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google), Llama (Meta), Mistral, Command R (Cohere)

Key insight: the model is a function — f(prompt) = completion. It cannot read files, browse the web, or remember you.

Moorish idol fish — The Agent

The Agent

A model in a loop with tools. Decides, acts, observes, repeats.

Marlin with fins and eyes

The loop core: while run_one_turn(state): pass. The LoopState holds messages, turn count, and transition reason. Each turn: call API, append response, check stop_reason — if "tool_use", execute tools and continue. If not, return.

Tool dispatch: a flat dictionary TOOL_HANDLERS = {"bash": handler, "read_file": handler, ...}. Adding a tool means adding one dict entry + one schema. The loop never changes.

Subagents: sub_messages = [{"role":"user","content":prompt}] — a fresh list IS the isolation. Only the final text returns. Children don't get the task tool (no recursion). Safety limit: 30 iterations.

Worktree isolation: git worktree add -b wt/{name} gives each subagent its own code copy. Branch naming: wt/{name}. Closeout: keep (preserve for review) or remove (force-delete).

Three ingredients: the model (brain), the loop (persistence), the tools (hands).

The model decides everything — what tool to call, when to stop, how to combine results. The "agent framework" is just a while loop.

Examples: Claude Code, Cursor Agent, Devin, Cline, Aider, Open Interpreter, OpenCode

coral

The Context

Everything the model can see. Its working memory. The real differentiator.

Marlin vs Dory -- same brain, different capacity

4 compaction levers (progressive):

1. Persist large output: results >30K chars saved to .task_outputs/, replaced with 2K char preview in <persisted-output> tags

2. Micro-compact: silent per-turn pass — tool results older than 3 turns replaced with [Earlier tool result compacted]

3. Auto-compact: at 50K char threshold, full transcript saved to .transcripts/, LLM summarizes preserving: goal, findings, files, remaining work

4. Manual compact: model calls compact tool with optional focus parameter

Error recovery: continuation on max_tokens (3 retries), auto-compact on prompt-too-long, exponential backoff (base * 2^attempt + jitter, max 30s)

Context = prompt + history + tool results. It's all just text stacked in a window.

When full: oldest parts get compacted (summarized or dropped). This is why long sessions degrade.

Sizes: 8K (small), 128K (standard), 200K (large), 1M (Claude Opus)

The mechanics

How It Works

The Agent Loop

Prompt
Think
Act
Observe
Pre-hook
Post-hook

Permission Pipeline Advanced

Four stages checked in order. First definitive answer wins.

1. Deny rules
Bypass-immune blocklist (shell injection, sudo, rm -rf). Always checked first.
2. Mode check
default=ask everything, plan=block writes, auto=allow reads.
3. Allow rules
Pattern-matched allowlist using fnmatch globs.
4. Ask user
Interactive y/n/always. "Always" writes a permanent allow rule.

Circuit breaker: after 3 consecutive denials, suggests switching to plan mode. Workspace trust: checked via .claude_trusted marker.

Hook System Advanced

External shell commands triggered at lifecycle events. Configured in settings.json.

PreToolUse
Runs before each tool. Exit 0=continue, 1=block. Can modify input via JSON response.
PostToolUse
Runs after each tool. Exit 2=inject stderr into conversation. Can add context.
SessionStart
Runs once at session start. Context via env: HOOK_EVENT, HOOK_TOOL_NAME, HOOK_TOOL_INPUT.

Timeout: 30s per hook. Trust gate: requires .claude_trusted or SDK mode.

Tools

  • Bash run shell commands
  • Read read files + images
  • Write create files
  • Edit modify files
  • Grep search content
  • Glob find files by pattern
  • Task spawn subagents
  • WebSearch search the internet

Context Window

Step through an example execution. Watch context grow, fill, and get compacted.

System
Instructions
Conversation
Tool results
Image data
Compacted
ready Press Play or use arrows to step through
Sys
Inst
Conv
Tool
Img
0 / 12
The infrastructure

The Harness

Configuration files that shape agent behavior. Your intelligence, crystallized into file structure.

Three layers of config: global (your identity and rules), user (your preferences, skills, and hooks), and project (project-specific instructions and code). The agent reads all three at startup and merges them by priority.

Click a file to preview its content
Advanced mechanics

Runtime & Platform Advanced

Memory System

Persistent file-based store in .memory/. Four categories:

  • user — preferences, role, expertise
  • feedback — corrections to enforce (what to avoid/repeat)
  • project — non-obvious conventions not in code
  • reference — external resource pointers (URLs, docs)

Each memory is one .md file with YAML frontmatter. Index capped at 200 lines. Never store: code structure, temp state, or secrets.

System Prompt Assembly

SystemPromptBuilder assembles 6 sections in order:

  1. Core instructions (identity, rules)
  2. Tool listings (schemas)
  3. Skill metadata (cheap catalog)
  4. Memory section (loaded at start)
  5. CLAUDE.md chain (user + project + directory)
  6. Dynamic context (per-turn reminders)

DYNAMIC_BOUNDARY marker separates cacheable from volatile sections.

Task System

Persistent JSON files in .tasks/. Fields: id, subject, status (pending/in_progress/completed/deleted), blockedBy[], blocks[]. Dependency resolution: completing a task removes it from all others' blockedBy. Background tasks run in daemon threads with stall detection (45s threshold).

MCP Plugins

External capability servers via stdio JSON-RPC. Tool namespacing: mcp__{server}__{tool}. Three risk levels: read/write/high. Plugin manifests in .claude-plugin/plugin.json. Native tools take priority on name collisions. Result preview capped at 500 chars.

Skill Loading Architecture

Two-layer pattern: Layer 1 — skill names + one-line descriptions injected into system prompt (~100 tokens per skill, cheap). Layer 2 — full skill body loaded on-demand via load_skill tool call (~2000 tokens). Each skill is a SKILL.md file with YAML frontmatter in a skills/*/ directory. This prevents prompt bloat while preserving capability access.

Practical rules

The Mindset

Never prompt in chat

Write in an editor. Read it twice. Dehumanize -- you're configuring a system, not chatting.

Strip "please", "could you", "I was wondering". Replace with: "Do X. Output format: Y. Constraints: Z."

Scripting has evolved

Create skills, agents, reusable context. Write instructions once, reuse forever.

If you've typed the same instruction 3 times, make it a skill file. Skills are two-layer: cheap name in system prompt, full body loaded on demand.

Keep context congruent

Don't teach the model to be smart. Give it the right information. Your job is congruence.

Contradictions in your instruction files = confused output. Review all your .md files as a system — do they tell a consistent story?

Think first, then prompt

Every wasted token is a wasted thought. Plan before you execute.

Use plan mode for non-trivial tasks. Break complex work into tasks. The agent loop is cheap — your context window is the expensive resource.
Live reconstruction

Watch It Happen

How an agent built the Finding Memo presentation. Each step is the loop in action.

agent sessionready
0 / 17
Loop state
Prompt
Think
Act
Observe
Context usage
0%ready
View the presentation (14 slides) Copyright-free version (original illustrations)