Finding Memo -- Learn AI Agentic Programming

The basics

The Cast

The Model

Text in, text out. Stateless. Smart but alone.

Marlin's brain

Statelessness in practice: every API call starts fresh. The model has zero memory of your previous conversation unless you resend it. "Memory" is an illusion created by the harness replaying history.

Under the hood: models are transformer neural networks. Key parameters: temperature (randomness), top-p (sampling), max tokens (output length). Token economics: input tokens are cheaper than output tokens. Context window size determines how much the model can "see" at once.

What it does: takes a prompt, produces a completion. One call, no memory of previous calls.

Examples: Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google), Llama (Meta), Mistral, Command R (Cohere)

Key insight: the model is a function — f(prompt) = completion. It cannot read files, browse the web, or remember you.

The Agent

A model in a loop with tools. Decides, acts, observes, repeats.

Marlin with fins and eyes

The loop core: while run_one_turn(state): pass. The LoopState holds messages, turn count, and transition reason. Each turn: call API, append response, check stop_reason — if "tool_use", execute tools and continue. If not, return.

Tool dispatch: a flat dictionary TOOL_HANDLERS = {"bash": handler, "read_file": handler, ...}. Adding a tool means adding one dict entry + one schema. The loop never changes.

Subagents: sub_messages = [{"role":"user","content":prompt}] — a fresh list IS the isolation. Only the final text returns. Children don't get the task tool (no recursion). Safety limit: 30 iterations.

Worktree isolation: git worktree add -b wt/{name} gives each subagent its own code copy. Branch naming: wt/{name}. Closeout: keep (preserve for review) or remove (force-delete).

Three ingredients: the model (brain), the loop (persistence), the tools (hands).

The model decides everything — what tool to call, when to stop, how to combine results. The "agent framework" is just a while loop.

Examples: Claude Code, Cursor Agent, Devin, Cline, Aider, Open Interpreter, OpenCode

The Context

Everything the model can see. Its working memory. The real differentiator.

Marlin vs Dory -- same brain, different capacity

4 compaction levers (progressive):

1. Persist large output: results >30K chars saved to .task_outputs/, replaced with 2K char preview in <persisted-output> tags

2. Micro-compact: silent per-turn pass — tool results older than 3 turns replaced with [Earlier tool result compacted]

3. Auto-compact: at 50K char threshold, full transcript saved to .transcripts/, LLM summarizes preserving: goal, findings, files, remaining work

4. Manual compact: model calls compact tool with optional focus parameter

Error recovery: continuation on max_tokens (3 retries), auto-compact on prompt-too-long, exponential backoff (base * 2^attempt + jitter, max 30s)

Context = prompt + history + tool results. It's all just text stacked in a window.

When full: oldest parts get compacted (summarized or dropped). This is why long sessions degrade.

Sizes: 8K (small), 128K (standard), 200K (large), 1M (Claude Opus)

The mechanics

How It Works

The Agent Loop

Prompt

Think

Act

Observe

Pre-hook

Post-hook

Permission Pipeline Advanced

Four stages checked in order. First definitive answer wins.

1. Deny rules
Bypass-immune blocklist (shell injection, sudo, rm -rf). Always checked first.

2. Mode check
default=ask everything, plan=block writes, auto=allow reads.

3. Allow rules
Pattern-matched allowlist using fnmatch globs.

4. Ask user
Interactive y/n/always. "Always" writes a permanent allow rule.

Circuit breaker: after 3 consecutive denials, suggests switching to plan mode. Workspace trust: checked via .claude_trusted marker.

Hook System Advanced

External shell commands triggered at lifecycle events. Configured in settings.json.

PreToolUse
Runs before each tool. Exit 0=continue, 1=block. Can modify input via JSON response.

PostToolUse
Runs after each tool. Exit 2=inject stderr into conversation. Can add context.

SessionStart
Runs once at session start. Context via env: HOOK_EVENT, HOOK_TOOL_NAME, HOOK_TOOL_INPUT.

Timeout: 30s per hook. Trust gate: requires .claude_trusted or SDK mode.

Tools

Bash run shell commands
Read read files + images
Write create files
Edit modify files
Grep search content
Glob find files by pattern
Task spawn subagents
WebSearch search the internet

Context Window

Step through an example execution. Watch context grow, fill, and get compacted.

System

Instructions

Conversation

Tool results

Image data

Compacted

ready Press Play or use arrows to step through

Sys

Inst

Conv

Tool

Img

0 / 12

The infrastructure

The Harness

Configuration files that shape agent behavior. Your intelligence, crystallized into file structure.

Three layers of config: global (your identity and rules), user (your preferences, skills, and hooks), and project (project-specific instructions and code). The agent reads all three at startup and merges them by priority.

Click a file to preview its content

Advanced mechanics

Runtime & Platform Advanced

Memory System

Persistent file-based store in .memory/. Four categories:

user — preferences, role, expertise
feedback — corrections to enforce (what to avoid/repeat)
project — non-obvious conventions not in code
reference — external resource pointers (URLs, docs)

Each memory is one .md file with YAML frontmatter. Index capped at 200 lines. Never store: code structure, temp state, or secrets.

System Prompt Assembly

SystemPromptBuilder assembles 6 sections in order:

Core instructions (identity, rules)
Tool listings (schemas)
Skill metadata (cheap catalog)
Memory section (loaded at start)
CLAUDE.md chain (user + project + directory)
Dynamic context (per-turn reminders)

DYNAMIC_BOUNDARY marker separates cacheable from volatile sections.

Task System

Persistent JSON files in .tasks/. Fields: id, subject, status (pending/in_progress/completed/deleted), blockedBy[], blocks[]. Dependency resolution: completing a task removes it from all others' blockedBy. Background tasks run in daemon threads with stall detection (45s threshold).

MCP Plugins

External capability servers via stdio JSON-RPC. Tool namespacing: mcp__{server}__{tool}. Three risk levels: read/write/high. Plugin manifests in .claude-plugin/plugin.json. Native tools take priority on name collisions. Result preview capped at 500 chars.

Skill Loading Architecture

Two-layer pattern: Layer 1 — skill names + one-line descriptions injected into system prompt (~100 tokens per skill, cheap). Layer 2 — full skill body loaded on-demand via load_skill tool call (~2000 tokens). Each skill is a SKILL.md file with YAML frontmatter in a skills/*/ directory. This prevents prompt bloat while preserving capability access.

Finding Memo