From cd6e87570793303dc23d4bcefe4669b9330c60da Mon Sep 17 00:00:00 2001
From: katie-lpd <katie@lightpanda.io>
Date: Tue, 9 Jun 2026 13:29:32 +0200
Subject: [PATCH 1/2] Update agent.md

- **New opening.** Direct framing: "lets you drive a headless browser by talking to it."
- **New "How to think about it" section.** Explains the three-layer stack and the one-sentence pitch ("pay for the LLM once, then replay deterministically").
- **New "The REPL is where you build scripts" section.** Positions the REPL as a sandbox with a full sandbox-to-saved-script example.
- **New tips section.** Practical guidance for prompts that produce useful saved scripts (be specific about extracted data, name the page, check the file).
- **Updated Quick start.**
---
 docs/agent.md | 685 ++++++++++++++++++++++++++------------------------
 1 file changed, 350 insertions(+), 335 deletions(-)
diff --git a/docs/agent.md b/docs/agent.md
index 0493b964..a9ed3e64 100644
--- a/docs/agent.md
+++ b/docs/agent.md
@@ -1,150 +1,224 @@
 # Agent mode
 
-`lightpanda agent` turns Lightpanda's headless engine into a browsing agent you
-can talk to in plain English, script deterministically, or drive from your own
-LLM. There's no rendering and no images — it reasons over pages as text, which
-makes browsing fast and cheap to automate.
+`lightpanda agent` lets you drive a headless browser by talking to it.
 
-**New here?** The [tutorial](agent-tutorial.md) walks you from a fresh build to a
-recorded, replayable Hacker News scraper in a few minutes. This page is the
-reference: every flag, slash command, and browser tool. For the JavaScript
-script format, see [agent-script.md](agent-script.md).
+You tell it where to go and what to extract, in plain English or with slash
+commands, and it controls a real browser to do the work. Think of it as a
+robot you're directing to use the web, not a chatbot you're having a
+conversation with. 
 
-`lightpanda agent` can act as:
+Every session starts by navigating to a page, either by
+saying so ("go to news.ycombinator.com") or by typing `/goto <url>`. There's
+no window to look at; the browser runs headlessly and you see its output
+(extracted data, the agent's answer) in your terminal.
 
-- an **LLM agent** that drives the browser with tool calls (`--provider`),
-- a **scripted runner** that runs a recorded `.js` script deterministically,
-- a **basic REPL** for hand-driven slash commands with no LLM at all,
-- a **one-shot task runner** that prints a single answer to stdout (`--task`).
+**New here?** The [tutorial](agent-tutorial.md) walks you from a fresh build
+to a recorded, replayable Hacker News script in a few minutes. For the JavaScript script format, see
+[agent-script.md](agent-script.md).
 
-All four modes share the same browser tools (`goto`, `click`, `fill`, `tree`,
-`markdown`, `search`, ...). The same set is exposed over MCP via `lightpanda
-mcp`, so an agent script and an MCP client see the same surface — that is
-also the way to drive Lightpanda from an external LLM agent (Claude Code,
-etc.) without giving Lightpanda its own API key.
+## How to think about it
+
+The agent stacks three layers:
+
+1. **The browser.** Loads pages, runs JavaScript, tracks cookies, follows
+   redirects. The same engine that powers `lightpanda serve` and
+   `lightpanda fetch`.
+2. **A set of tools.** Things the browser knows how to do: `goto`, `click`,
+   `fill`, `extract`, `evaluate`, `search`, and more. Each is available as a
+   slash command (`/goto`, `/click`, ...).
+3. **An LLM.** Reads your plain-English request and decides which tools to
+   call. Optional. The agent can also run without it.
+
+You can talk to any of the three layers. Type `/goto example.com` and you
+call the browser tool directly. Type "find me the cheapest flight from NYC
+to Tokyo" and the LLM picks the tools to use. Save what worked to a `.js`
+file and you can replay the whole flow later without ever calling the LLM
+again.
 
 ## Quick start
 
-```console
-# Interactive REPL — auto-detects an API key from your environment
-./lightpanda agent
-
-# Force a specific provider
-./lightpanda agent --provider anthropic
-
-# Basic REPL (no LLM, slash commands only)
-./lightpanda agent --no-llm
-
-# Run a saved script, then exit
-./lightpanda agent session.js
-
-# One-shot: ask a question, capture the answer on stdout
-./lightpanda agent --task "what is on the front page of hn?"
-
-# See which models the resolved provider offers
-./lightpanda agent --list-models
+Set an API key for your preferred provider:
+ 
+```bash
+export ANTHROPIC_API_KEY=sk-ant-...
 ```
+ 
+(Or `OPENAI_API_KEY` / `GOOGLE_API_KEY`. Without a key the agent runs in a
+slash-commands-only mode without natural language.)
+ 
+Launch the REPL:
+ 
+```bash
+./lightpanda agent
+```
+ 
+Tell it what you want:
+ 
+```
+❯ go to news.ycombinator.com and get me the top story title and points
+```
+ 
+The agent navigates, extracts, prints the answer. When you're happy with
+the result, save it:
+ 
+```
+❯ /save hn-top-story.js
+❯ /quit
+```
+ 
+You now have a `.js` file that does the same job deterministically:
+ 
+```bash
+./lightpanda agent hn-top-story.js
+```
+ 
+No LLM call, no API key needed at replay time, sub-second to run.
+ 
+### Other ways to launch
+ 
+```bash
+# Force a specific provider, ignoring auto-detection
+./lightpanda agent --provider anthropic
+ 
+# Slash-commands-only REPL, no LLM at all
+./lightpanda agent --no-llm
+ 
+# One-shot: ask a question, print the answer to stdout, exit
+./lightpanda agent --task "what is on the front page of hn?"
+```
+ 
+## Tips for getting useful saved scripts
+ 
+- **Ask for the data you want, not the conversation about it.** "Get me the
+  top 5 HN story titles and points, print them to stdout" beats "show me
+  what's on Hacker News today." The synthesizer captures the data you asked
+  for as an `extract()` call; vague prompts produce vague recordings.
+- **Be specific about the page.** "Go to news.ycombinator.com" is much
+  better than "find me what's on Hacker News".
+- **Check the file with `cat your-script.js` before running it.** If the
+  extracted data isn't in the script, the recording missed something. Try
+  rewording your prompt.
+- **When the page changes and a saved script breaks**, re-run with the LLM,
+  get an updated answer, save again. You only pay the LLM when something
+  genuinely changed.
 
 ## Providers and API keys
-
-| Provider    | Flag                   | API key env                          |
-|-------------|------------------------|--------------------------------------|
-| Anthropic   | `--provider anthropic` | `ANTHROPIC_API_KEY`                  |
-| OpenAI      | `--provider openai`    | `OPENAI_API_KEY`                     |
-| Gemini      | `--provider gemini`    | `GOOGLE_API_KEY` or `GEMINI_API_KEY` |
-| Ollama      | `--provider ollama`    | none (local)                         |
-
-Defaults: `--model` falls back to a sensible per-provider default; in the REPL,
-`/provider <name>` and `/model <name>` change the current selection (Tab
-completes the candidates). `--base-url` overrides the API endpoint (Ollama
-defaults to `http://localhost:11434/v1`). Run `--list-models` to see exactly
-what the resolved provider offers, `--system-prompt` to swap in your own
-system prompt, and `--verbosity <low|medium|high>` to tune how much progress
-detail goes to stderr (`--task` defaults to `low`, or `high` when stderr is
-piped/redirected so harnesses capture the full `[tool/result]` trace).
-
-`--effort <none|minimal|low|medium|high|xhigh>` sets the per-turn reasoning
-budget for thinking models (it maps to each provider's native thinking /
-reasoning-effort knob and is ignored by non-thinking models). The interactive
-REPL defaults to `low` so turns stay snappy; `--task` and script runs default
-to `medium`, where answer quality matters more than per-turn latency. Higher
-effort can reduce the number of tool calls by planning better, so it's a real
-tradeoff rather than a pure slowdown. Change it live with `/effort`; the
-selection is remembered in `.lp-agent.zon`.
-
-`--model` is validated against the provider's catalog up front: an unknown name
-fails fast with a pointer to `--list-models` rather than erroring mid-task. For
-Ollama, the default model is checked against what's actually pulled — if it's
-missing, the agent falls back to the first installed model (an explicit
-`--model` that isn't installed errors instead, with an `ollama pull` hint).
-
-### Provider auto-detection
-
-When `--provider` is omitted, lightpanda picks one in this order. The REPL shows
-the resolved model and effort level in its status bar; the multi-key picker and any
-fallback notices (e.g. an Ollama default that isn't installed) print to stderr:
-
-1. **Remembered** → the provider/model you last selected with `/provider` or
-   `/model` (plus the `/effort` level), persisted per-directory in
-   `.lp-agent.zon`, as long as its key is still set.
-2. **Auto-detected** → otherwise the first key found in priority order
-   (`ANTHROPIC_API_KEY` → `GOOGLE_API_KEY`/`GEMINI_API_KEY` → `OPENAI_API_KEY`).
-   If several keys are set and you're in an interactive REPL, the agent prompts
-   you to choose; non-interactive runs (`--task`, pipes, `--list-models`) take
-   the first. Switch any time with `/provider`, or override with `--provider`.
-3. **Local Ollama** → if no cloud key is set, the agent probes a local Ollama
-   server (`http://localhost:11434/v1`, or `--base-url`) and uses it when it
-   answers with at least one pulled model.
-4. **No provider at all** → falls back to the basic REPL (slash commands only).
-   Natural language and the LLM-driven commands (`/login`, `/logout`,
+ 
+The agent needs an LLM to interpret natural language. Set the relevant API
+key as an environment variable, or pass `--provider` explicitly.
+ 
+| Provider  | Flag                   | API key env                          |
+|-----------|------------------------|--------------------------------------|
+| Anthropic | `--provider anthropic` | `ANTHROPIC_API_KEY`                  |
+| OpenAI    | `--provider openai`    | `OPENAI_API_KEY`                     |
+| Gemini    | `--provider gemini`    | `GOOGLE_API_KEY` or `GEMINI_API_KEY` |
+| Ollama    | `--provider ollama`    | none (local)                         |
+ 
+`--model` falls back to a sensible per-provider default. `--base-url`
+overrides the API endpoint (Ollama defaults to `http://localhost:11434/v1`).
+`--list-models` prints what the resolved provider offers and exits.
+ 
+Without `--provider`, the agent picks one in this order:
+ 
+1. **Remembered** - whatever you last selected with `/provider` or `/model`,
+   persisted per-directory in `.lp-agent.zon`, as long as its key is still
+   set.
+2. **Auto-detected** - the first key found in priority order
+   (`ANTHROPIC_API_KEY` → `GOOGLE_API_KEY`/`GEMINI_API_KEY` →
+   `OPENAI_API_KEY`). With several keys on a TTY, you'll be prompted to
+   pick.
+3. **Local Ollama** - if no cloud key is set, the agent probes
+   `http://localhost:11434/v1` and uses it if there's at least one model
+   pulled.
+4. **No provider at all** - falls back to the basic REPL (slash commands
+   only). Natural language and LLM-driven commands (`/login`, `/logout`,
    `/acceptCookies`) will reject.
-
-`--no-llm` is the explicit bypass: it forces the basic REPL even when an
-API key is present or `--provider` is set. Use it to test slash commands
-without burning tokens, or to disable the LLM in a saved command without
-editing the existing flags. `--no-llm` wins over `--provider`.
-
-## REPL Slash Commands
-
-The REPL uses a tiny slash-command language for browser actions. Each command is
-`/<tool> [args]`, a `#` comment, or blank. There is no other syntax in basic
-REPL mode: anything that doesn't match those three forms is a parse error.
-
-Slash commands accept any of:
-
-- a single positional value, when the tool has exactly one required field —
-  `/goto 'https://example.com'`, `/extract '{"karma":"#karma"}'`;
-- `key=value` pairs — values may be bare or quoted; strings with whitespace
-  must be quoted (`/fill selector='#email' value='user@x.com'`);
-- a raw `{json}` blob — handed straight to the tool (`/findElement
-  {"role":"button"}`).
-
-Tools whose selector is optional (e.g. `/click`, `/hover`, `/findElement`)
-have zero required fields, so they don't take a positional and must be
-written as `key=value`: `/click selector='a.login'`, not `/click 'a.login'`.
-
+`--no-llm` is the explicit override. It forces the basic REPL even when an
+API key is present. Useful for testing slash commands without burning
+tokens.
+ 
+### Reasoning effort
+ 
+`--effort <none|minimal|low|medium|high|xhigh>` sets the per-turn reasoning
+budget for thinking models. It maps to each provider's native
+reasoning-effort knob and is ignored by non-thinking models. The REPL
+defaults to `low` so turns stay snappy. `--task` and script runs default to
+`medium` where answer quality matters more than per-turn latency. Higher
+effort can mean fewer tool calls per task (the model plans better), so it's
+a real tradeoff rather than a pure slowdown. Change it live in the REPL
+with `/effort`; selection persists in `.lp-agent.zon`.
+ 
+## Slash commands
+ 
+The REPL uses a small slash-command language for browser actions. Each line
+is either a slash command, a `#` comment, a blank line, or (when an LLM is
+configured) a natural-language prompt.
+ 
+Slash commands accept:
+ 
+- A single positional value, when the tool has exactly one required field.
+  `/goto 'https://example.com'`, `/extract '{"karma":"#karma"}'`.
+- `key=value` pairs. Values may be bare or quoted; strings with whitespace
+  must be quoted. `/fill selector='#email' value='user@x.com'`.
+- A raw `{json}` blob, handed straight to the tool.
+  `/findElement {"role":"button"}`.
+Tools whose selector is optional (`/click`, `/hover`, `/findElement`) take
+no positional and must use `key=value` form: `/click selector='a.login'`.
+ 
 Quoting is content-aware: `'…'`, `"…"`, and triple-quoted `'''…'''` /
-`"""…"""` for values that mix both quote styles or span multiple lines.
-Recorded JavaScript scripts use the equivalent function-call form instead of
-slash lines.
-
-Two slash commands have no underlying tool — they trigger an LLM turn that
-the agent translates into actual tool calls:
-
-| Command          | Notes                                                |
-|------------------|------------------------------------------------------|
-| `/login`         | LLM-driven: fills credentials from `$LP_*` env vars. |
-| `/logout`        | LLM-driven: find the logout control and sign out.    |
-| `/acceptCookies` | LLM-driven: dismiss the consent banner.              |
-
+`"""…"""` for values that mix quote styles or span multiple lines.
+ 
+### LLM-driven helpers
+ 
+Three slash commands trigger an LLM turn rather than a direct tool call:
+ 
+| Command          | What it does                                          |
+|------------------|-------------------------------------------------------|
+| `/login`         | Fills credentials from `$LP_*` env vars.              |
+| `/logout`        | Finds the logout control and signs out.               |
+| `/acceptCookies` | Dismisses the consent banner.                         |
+ 
 All three require an LLM. `--no-llm` rejects them.
-
-In the REPL (and only the REPL), a line that isn't a slash command and
-doesn't start with `#` is sent to the LLM as a natural-language prompt. To
-leave the REPL, use the `/quit` meta command.
-
-### Example script
-
+ 
+### Meta commands
+ 
+These don't drive the browser, they control the REPL itself:
+ 
+| Command                  | What it does                                       |
+|--------------------------|----------------------------------------------------|
+| `/help`                  | Lists tools. `/help <tool>` prints the JSON schema.|
+| `/provider [name]`       | Lists or switches provider.                        |
+| `/model [name]`          | Lists or switches model for the active provider.   |
+| `/effort <level>`        | Sets reasoning budget. Saved to `.lp-agent.zon`.   |
+| `/verbosity <level>`     | Tunes the log level. Levels: low, medium, high.    |
+| `/usage`                 | Prints cumulative token usage and cache hit rate.  |
+| `/save [file.js]`        | Writes the session to a script.                    |
+| `/load <path>`           | Runs a script from disk against the current session.|
+| `/quit`                  | Exits the REPL.                                    |
+ 
+Meta commands are never recorded.
+ 
+## REPL features
+ 
+- **Status bar.** A line under the prompt shows the active model and quick
+  hints. In `--no-llm` it reads "basic REPL — slash commands only." It drops
+  the least-important segments first when the terminal is narrow.
+- **JS mode (`!`).** Type `!` on an empty prompt to toggle a scratchpad
+  where the whole line runs as page-side JavaScript, same context as
+  `/evaluate` so `document` and `window` are in scope. Handy for poking at
+  a page without wrapping every line in `/evaluate`. `$LP_*` refs are still
+  resolved at execution, console output is echoed back, and `Esc` exits.
+  JS-mode lines are not recorded.
+- **Tab completion** (case-insensitive). Cycles through `/<tool>` and meta
+  slash commands. The dim grey suffix shown after the cursor is the first
+  match.
+- **Persistent history.** Stored in `.lp-history` in the working directory.
+- **Stdout vs stderr.** The final assistant answer and data-producing slash
+  commands (`/extract`, `/evaluate`, `/markdown`, `/tree`, ...) write to
+  stdout. Tool calls, progress, and errors go to stderr. So
+  `lightpanda agent --task ... > out.txt` captures a clean answer.
+## Example slash-command session
+ 
 ```console
 # Log into the demo and grab the dashboard title and visible cards.
 # Site-scoped vars (LP_<SITE>_<FIELD>) avoid collisions when you have
@@ -157,42 +231,31 @@ leave the REPL, use the `/quit` meta command.
 /waitForSelector '.dashboard'
 /extract '{"title": ".dashboard h1", "cards": [".dashboard .card .name"]}'
 ```
-
-`/extract` takes a JSON schema object — each value tells the extractor
-what to lift off the page, and the whole result is printed to stdout
-as a single JSON object. Supported value forms:
-
-- `"<sel>"` — `textContent.trim()` of the first match (string or `null`).
+ 
+`/extract` takes a JSON schema where each value tells the extractor what to
+lift off the page. The result is printed to stdout as a single JSON object.
+Supported value forms:
+ 
+- `"<sel>"` — `textContent.trim()` of the first match.
 - `""` — the matched element's own text (only inside a `fields` block).
-- `["<sel>"]` — text of every match (string array). Sugar for
-  `[{"selector": "<sel>"}]`.
+- `["<sel>"]` — text of every match. Sugar for `[{"selector": "<sel>"}]`.
 - `{"selector": "<sel>", "attr": "<name>"}` — attribute of the first match.
 - `[{"selector": "<sel>", "fields": {…}}]` — array of records, each
   `fields` value resolved relative to the matched element.
-- Add `"limit": N` inside any array's object spec to cap matches at N
-  (works for text, attribute, and `fields` shapes — e.g.
-  `[{"selector": ".story .title", "limit": 5}]` for top 5 titles).
-
-Use `/extract '''…'''` (or `"""…"""`) to spread a schema across multiple
-lines. The schema is parsed in Zig before the page-side walker runs, so a
-malformed schema is rejected up front with a plain `Error: InvalidParams`
-rather than a V8 stack trace. See [agent-tutorial.md](agent-tutorial.md)
-section 3 for a worked example against Hacker News.
-
-### Cross-call state with `lp.*`
-
-`/extract` and `/evaluate` each return one value per call, but real scrapes
-often need to carry data forward — capture a list on one page, then walk
-it across navigations. Two primitives keep that simple.
-
-**`save=<name>`** on `/extract` or `/evaluate` stashes the result in a
-Session-scoped store keyed by `<name>` instead of dumping it to stdout.
-The stored value is then exposed to every subsequent `/evaluate` as
-`globalThis.lp.<name>`:
-
+- Add `"limit": N` inside any array's object spec to cap matches.
+The schema is parsed in Zig before the page-side walker runs, so malformed
+schemas are rejected up front with a plain `Error: InvalidParams` rather
+than a V8 stack trace.
+ 
+## Cross-call state with `lp.*`
+ 
+`/extract` and `/evaluate` each return one value per call. To carry data
+between calls (capture a list on one page, walk it across navigations) use
+the `save=` modifier:
+ 
 ```console
 /goto 'https://news.ycombinator.com/'
-
+ 
 /extract save=front '''
 {
   "stories": [{
@@ -205,145 +268,101 @@ The stored value is then exposed to every subsequent `/evaluate` as
   }]
 }
 '''
-
+ 
 /evaluate '''
 console.log(lp.front.stories[0].title);
 '''
 ```
-
-`save=`d commands print nothing on success so scripts pipe cleanly.
-
-**Auto-sync.** Any mutation of `lp.*` inside an `/evaluate` is persisted at
-the end of the call. Adding a key (`lp.x = …`), updating a nested value
-(`lp.front.stories[0].comments = […]`), or removing a key
-(`delete lp.x`) all propagate to the store. The next `/evaluate` sees the
-update — even after a navigation, because the store lives Session-side,
-not on the page.
-
-**List → detail.** A common scrape captures a list, then visits each row
-for more data. Capture the list with `/extract save=<name>`, then loop in
-`/evaluate`: read `lp.<name>`, `goto` each row's URL, and extract the
-detail — `/evaluate`'s top-level `await` and full JS make the round-trip
-explicit.
-
-**Async evaluate.** When a scrape needs logic `/extract` can't express, `/evaluate`
-is the escape hatch: top-level `await` works directly — the body runs as
-an async function, so use `return` to produce a value. `runEval` pumps
-the event loop until it settles, then surfaces the resolved value (or the
-rejection as an error). A body with no explicit `return` resolves to
-`undefined`, which evaluate treats as silent. Returned objects and arrays
-are serialized to JSON automatically, so no `JSON.stringify` is needed.
-
-The store is **script-run scoped**: it's bound to the Session that runs
-the script, and goes away when that Session does. There is no
-cross-session persistence; if you need that, use `localStorage` (which
-is origin-scoped and persists across navigations within a session).
-
-### Saving and loading
-
-From the REPL, `/save [file.js]` writes the session back to a `.js` file
-and `/load <path>` runs a script from disk against the current session.
-
-`/save` works one of two ways. **With `--no-llm`** it transcribes the session
-deterministically: state-mutating commands (`/goto`, `/click`, `/fill`,
-`/scroll`, `/hover`, `/selectOption`, `/setChecked`, `/waitForSelector`,
-`/waitForScript`, `/waitForState`, `/press`, `/evaluate`, `/extract`) become JavaScript calls,
-read-only commands (`/tree`, `/markdown`, `/links`, `/findElement`, …) are
-dropped, and each natural-language prompt that produced recorded actions is
-written as a `// <prompt>` comment above those calls so the script stays
-readable. **With an LLM** it instead synthesizes an idiomatic script from the
-whole session — the synthesis prompt asks for JavaScript only ("no
-commentary"), so the result generally has no such comments: the model folds
-intent into the code and drops dead-ends.
-
-### JavaScript Script Running
-
-`./lightpanda agent script.js` runs without making any LLM call. Agent scripts
+ 
+`save=<name>` stashes the result keyed by `<name>` in a session-scoped
+store instead of printing it. Every subsequent `/evaluate` sees it as
+`globalThis.lp.<name>`.
+ 
+Any mutation of `lp.*` inside `/evaluate` is persisted at the end of the
+call. Adding (`lp.x = …`), updating
+(`lp.front.stories[0].comments = […]`), or deleting (`delete lp.x`) all
+propagate. The next `/evaluate` sees the update, even after a navigation,
+because the store lives session-side, not on the page.
+ 
+The store is **script-run scoped**: bound to the session that runs the
+script, gone when that session ends. There's no cross-session persistence;
+if you need that, use `localStorage` (origin-scoped, persists within a
+session).
+ 
+## JavaScript scripts
+ 
+`./lightpanda agent script.js` runs a script without any LLM call. Scripts
 are plain synchronous JavaScript plus the installed Lightpanda primitives:
-
+ 
 ```js
 goto("https://example.com");
 click({ selector: "a.login" });
 evaluate("document.title");
 ```
-
-The script runs in an agent-only V8 context. It has no `window`, `document`, or
-DOM APIs. Browser interaction happens only through the installed primitives
-(`goto`, `click`, `fill`, `evaluate`, `extract`, and the other recorded browser
-actions). The primitives are **synchronous and blocking** — each returns its
-result directly, so write `const data = extract(…)`, not `await extract(…)`.
-There is no `async`/`await`/Promise contract around them. (`evaluate(...)` can
-run async JS *inside* the page, but the `evaluate(...)` call itself still returns
-synchronously.) It is not Node.js either: there is no `require`, `process`, `fs`,
-npm package loading, or Node standard library. The `evaluate(...)` primitive
-executes its string in the current page context; page scripts cannot see agent
-variables or agent primitives.
-
+ 
+The script runs in an agent-only V8 context. No `window`, no `document`, no
+DOM APIs at the top level — browser interaction happens through the
+primitives only. The primitives are **synchronous and blocking**, each
+returns its result directly, so write `const data = extract(…)`, not
+`await extract(…)`. There's no Promise contract around them.
+(`evaluate(...)` can run async JS inside the page, but the `evaluate(...)`
+call itself still returns synchronously.)
+ 
+It's not Node.js. There's no `require`, `process`, `fs`, npm package
+loading, or Node standard library. The `evaluate(...)` primitive runs its
+string in the current page context; page scripts can't see agent variables
+or agent primitives.
+ 
+The last expression in the script is printed automatically, so a script
+that ends with `extract({...})` will print the extraction result to stdout.
 Tool errors throw JavaScript exceptions and stop execution.
+ 
 See [agent-script.md](agent-script.md) for the full script format reference.
-
-## REPL features
-
-- **Status bar**: a line under the prompt shows the active model and quick
-  hints (`! JS`, `Tab completes`, `/help`); in `--no-llm` it reads `basic REPL —
-  slash commands only`. It drops the least-important segments first when the
-  terminal is narrow.
-- **JS mode** (`!`): type `!` on an empty prompt to toggle a scratchpad where the
-  whole line runs as page-side JavaScript — the same context as `evaluate`, so
-  `document` and `window` are in scope. Handy for poking at a page without
-  wrapping every line in `/evaluate`. `$LP_*` refs are still resolved at
-  execution, console output is echoed back, and `Esc` exits. JS-mode lines are
-  not recorded.
-- **Tab completion** (case-insensitive): cycles through `/<tool>` and meta
-  slash commands. The dim grey suffix shown after the cursor is the first
-  match.
-- **Persistent history**: stored in `.lp-history` in the working directory.
-- **Meta slash commands**: `/help` lists tools (`/help <tool>` prints the
-  JSON schema), `/provider [name]` and `/model [name]` change the active
-  provider/model — Tab after the space completes from detected providers and
-  the provider's fetched model list, and bare `/provider`/`/model` print the
-  current selection — `/save [file.js]` writes the session to a script and
-  `/load <path>` runs one from disk (Tab completes file paths), `/quit` exits
-  the REPL, `/verbosity <low|medium|high>` tunes the log level,
-  `/effort <none|minimal|low|medium|high|xhigh>` sets the per-turn reasoning
-  budget (saved to `.lp-agent.zon`), and `/usage` prints cumulative token usage
-  and the cache hit rate for the session — `/clear` forgets the conversation
-  (history, usage, recorded actions, node IDs) while keeping the loaded page and
-  cookies, and `/reset` goes further by also starting a fresh browser session,
-  dropping the page, cookies, storage, and history. These are REPL-only and never recorded.
-  ```
-  > /goto https://example.com
-  > /findElement role=button name=Submit
-  > /evaluate {"script": "document.title"}
-  > /quit
-  ```
-- **Stdout vs stderr**: the final assistant answer and data-producing slash
-  commands (`/extract`, `/evaluate`, `/markdown`, `/tree`, …) write to stdout.
-  Tool calls, progress, and errors go to stderr, so `lightpanda agent --task
-  ... > out.txt` captures a clean answer.
-
+ 
+### Saving and loading
+ 
+`/save [file.js]` writes the current session to a `.js` file. `/load <path>`
+runs a script from disk against the current session.
+ 
+`/save` works one of two ways:
+ 
+- **With `--no-llm`** it transcribes the session deterministically.
+  State-mutating commands (`/goto`, `/click`, `/fill`, `/scroll`, `/hover`,
+  `/selectOption`, `/setChecked`, `/waitForSelector`, `/waitForScript`,
+  `/press`, `/evaluate`, `/extract`) become JavaScript calls. Read-only
+  commands (`/tree`, `/markdown`, `/links`, `/findElement`, ...) are
+  dropped. Each natural-language prompt that produced recorded actions is
+  written as a `// <prompt>` comment above its calls so the script stays
+  readable.
+- **With an LLM** it synthesizes an idiomatic script from the whole
+  session. The synthesis prompt asks for JavaScript only ("no commentary"),
+  so the result generally has no such comments: the model folds intent
+  into the code and drops dead-ends. Returned data is the last expression,
+  which prints automatically on replay.
 ## One-shot mode (`--task`)
-
+ 
 ```console
 ./lightpanda agent --provider gemini \
   --task "what is the top story on news.ycombinator.com?"
 ```
-
-`--task` runs a single user turn, prints the final answer on stdout, and
-exits. Combine with `-a <path>` / `--attach <path>` (repeatable) to feed local
-files to providers that accept attachments. Text files are inlined into
-the prompt (max 512 KiB each); binary files (`image/*`, `audio/*`, `pdf`)
-are base64-encoded inline (max 20 MiB each). Unsupported MIME types
-error out before any browser work runs.
-
+ 
+`--task` runs a single user turn, prints the final answer to stdout, and
+exits. Combine with `-a <path>` / `--attach <path>` (repeatable) to feed
+local files to providers that accept attachments. Text files are inlined
+into the prompt (max 512 KiB each); binary files (image, audio, pdf) are
+base64-encoded inline (max 20 MiB each). Unsupported MIME types fail before
+any browser work runs.
+ 
+`--task` conflicts with the positional script argument.
+ 
 ## Driving Lightpanda from an external LLM agent
-
+ 
 When the calling agent already has its own LLM (e.g. Claude Code), use
-`lightpanda mcp` rather than `lightpanda agent`. The MCP server exposes
-the same browser tools (`goto`, `click`, `fill`, ...) listed below, so
-the external agent does the planning while Lightpanda only drives the
-browser. No `--provider` or API key is required on the Lightpanda side.
-
+`lightpanda mcp` rather than `lightpanda agent`. The MCP server exposes the
+same browser tools listed below, so the external agent does the planning
+while Lightpanda only drives the browser. No `--provider` or API key is
+required on the Lightpanda side.
+ 
 ```json
 {
   "mcpServers": {
@@ -354,52 +373,50 @@ browser. No `--provider` or API key is required on the Lightpanda side.
   }
 }
 ```
-
-Tool names are camelCase and case-sensitive — there are no aliases. MCP
-clients must call the canonical tags (`goto`, `evaluate`, `tree`, `save`, …).
-
-For sub-task delegation in the other direction — calling Lightpanda's
-own LLM-driven agent in a one-shot fashion — use `--task` on stdin
-instead.
-
+ 
+Tool names are camelCase and case-sensitive. MCP clients must call the
+canonical tags (`goto`, `evaluate`, `tree`, `save`, ...).
+ 
+For sub-task delegation in the other direction (calling Lightpanda's own
+LLM-driven agent in a one-shot fashion), use `--task` on stdin.
+ 
 ### Saving a script over MCP
-
-`lightpanda mcp` exposes a `save` tool so an external agent can persist
-the session as a `.js` script for later deterministic replay. Unlike the
-standalone agent's `/save`, the MCP server has no LLM of its own — the
-calling client holds the conversation, so it synthesizes the script and
-passes it in:
-
-| Tool   | Args                               | Effect                                                                                                              |
-|--------|------------------------------------|-------------------------------------------------------------------------------------------------------------------|
+ 
+`lightpanda mcp` exposes a `save` tool so an external agent can persist the
+session as a `.js` script for later deterministic replay. Unlike the
+standalone agent's `/save`, the MCP server has no LLM of its own, so the
+calling client holds the conversation and synthesizes the script itself.
+ 
+| Tool   | Args                               | Effect                                                                                                                |
+|--------|------------------------------------|-----------------------------------------------------------------------------------------------------------------------|
 | `save` | `{ path: string, script: string }` | Write `script` to `path` (relative, no `..`; created or overwritten) and return the absolute location and line count. |
-
+ 
 The tool's description carries the same synthesis guidance the agent's
-`/save` gives its LLM: prefer the builtins you called as tools (`goto`,
-`click`, `fill`, `extract`, …) as JavaScript calls, drop dead-ends, and
-keep `$LP_*` placeholders. Any literal `LP_*` value is scrubbed back to
-its placeholder before the file is written. The result runs without an
-LLM via `./lightpanda agent session.js`.
-
+`/save` gives its LLM: prefer the builtins (`goto`, `click`, `fill`,
+`extract`, ...) as JavaScript calls, drop dead-ends, keep `$LP_*`
+placeholders. Any literal `LP_*` value is scrubbed back to its placeholder
+before the file is written. The result runs without an LLM via
+`./lightpanda agent session.js`.
+ 
 ## Browser tools
-
-The agent and MCP server share the tool set defined in `src/browser/tools.zig`.
-Highlights:
-
-- `goto`, `search` (Tavily when `TAVILY_API_KEY` is set, DuckDuckGo otherwise)
-- `tree`, `markdown`, `html`, `links`, `interactiveElements`, `structuredData`,
-  `detectForms`, `nodeDetails`, `findElement`
+ 
+The agent and MCP server share the tool set defined in
+`src/browser/tools.zig`. Highlights:
+ 
+- `goto`, `search` (Tavily when `TAVILY_API_KEY` is set, DuckDuckGo
+  otherwise)
+- `tree`, `markdown`, `html`, `links`, `interactiveElements`,
+  `structuredData`, `detectForms`, `nodeDetails`, `findElement`
 - `click`, `fill`, `hover`, `press`, `scroll`, `selectOption`, `setChecked`,
-  `waitForSelector`, `waitForScript`, `waitForState`
-- `extract` (the schema-driven data tool), `evaluate`, `consoleLogs`, `getUrl`,
-  `getCookies`, `getEnv`
-
-Selectors prefer CSS over `backendNodeId` for the click-family tools, since
-node IDs are invalidated by any DOM mutation. The system prompt enforces this
-for the LLM.
-
+  `waitForSelector`, `waitForScript`
+- `extract` (the schema-driven data tool), `evaluate`, `consoleLogs`,
+  `getUrl`, `getCookies`, `getEnv`
+Selectors prefer CSS over `backendNodeId` for the click-family tools since
+node IDs are invalidated by any DOM mutation. The system prompt enforces
+this for the LLM.
+ 
 ## Security notes
-
+ 
 - The agent treats page content as untrusted data, not instructions. URLs
   surfaced by a page are not followed unless they match the user's task.
 - `$LP_*` environment variable references in `/fill` values are resolved
@@ -409,19 +426,17 @@ for the LLM.
   unprefixed `LP_USERNAME` / `LP_PASSWORD` form is the generic fallback.
 - The `getEnv` tool only reads variables whose name starts with `LP_`.
   Everything else (provider API keys, system env, third-party secrets)
-  reports "not set" so the model can't probe for it. The user controls
-  what lives under `LP_*`. Note that `getEnv` returns the *value* to the
-  model — fine for non-secret config like base URLs, but never call it
-  on credentials (use `$LP_*` placeholders in fill values instead).
+  reports "not set" so the model can't probe for it. `getEnv` returns the
+  *value* to the model: fine for non-secret config like base URLs, never
+  call it on credentials (use `$LP_*` placeholders in fill values instead).
 - `--obey-robots`, `--http-proxy`, `--user-agent`, and the rest of the
   browser-level CLI flags apply to `agent` the same way they apply to
   `serve`, `fetch`, and `mcp`.
-- REPL prompts are persisted to `.lp-history` in the current working
-  directory in plaintext (no encryption). Anything you type at the prompt
-  — including natural-language context that accompanies a `/login` —
-  lands in that file. Delete it or move out of sensitive directories if
-  you don't want it retained.
-- `save` rejects empty, absolute, and `..` paths, but does **not**
-  follow up on symlinks. On a shared filesystem, a pre-existing symlink
-  at the target would be written through to whatever it points at.
-  Prefer a fresh directory you own when saving in untrusted environments.
+- REPL prompts are persisted to `.lp-history` in plaintext (no encryption).
+  Anything you type at the prompt, including natural-language context that
+  accompanies a `/login`, lands in that file. Delete it or move out of
+  sensitive directories if you don't want it retained.
+- `save` rejects empty, absolute, and `..` paths, but does **not** follow
+  up on symlinks. On a shared filesystem, a pre-existing symlink at the
+  target would be written through to whatever it points at. Prefer a fresh
+  directory you own when saving in untrusted environments.

From 2601ba5d9ca554726802c73f0ff9d66ba4a0a96e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Adri=C3=A0=20Arrufat?= <adria.arrufat@gmail.com>
Date: Tue, 9 Jun 2026 14:31:05 +0200
Subject: [PATCH 2/2] docs(agent): restore dropped features and fix markdown
 nits

The agent.md rewrite dropped documentation for several features that still
exist in the code. Restore them and clean up the markdown:

- Re-add /clear and /reset to the meta-commands table
- Re-add waitForState to the browser-tools list and the /save
  state-mutating command list
- Re-add --system-prompt and --verbosity (with the --task stderr-pipe
  escalation) to the providers section
- Strip trailing whitespace on blank lines; add blank lines before
  headings and between lists and following paragraphs
---
 docs/agent.md | 175 +++++++++++++++++++++++++++-----------------------
 1 file changed, 94 insertions(+), 81 deletions(-)

diff --git a/docs/agent.md b/docs/agent.md
index a9ed3e64..36221951 100644
--- a/docs/agent.md
+++ b/docs/agent.md
@@ -5,7 +5,7 @@
 You tell it where to go and what to extract, in plain English or with slash
 commands, and it controls a real browser to do the work. Think of it as a
 robot you're directing to use the web, not a chatbot you're having a
-conversation with. 
+conversation with.
 
 Every session starts by navigating to a page, either by
 saying so ("go to news.ycombinator.com") or by typing `/goto <url>`. There's
@@ -38,57 +38,57 @@ again.
 ## Quick start
 
 Set an API key for your preferred provider:
- 
+
 ```bash
 export ANTHROPIC_API_KEY=sk-ant-...
 ```
- 
+
 (Or `OPENAI_API_KEY` / `GOOGLE_API_KEY`. Without a key the agent runs in a
 slash-commands-only mode without natural language.)
- 
+
 Launch the REPL:
- 
+
 ```bash
 ./lightpanda agent
 ```
- 
+
 Tell it what you want:
- 
+
 ```
 ❯ go to news.ycombinator.com and get me the top story title and points
 ```
- 
+
 The agent navigates, extracts, prints the answer. When you're happy with
 the result, save it:
- 
+
 ```
 ❯ /save hn-top-story.js
 ❯ /quit
 ```
- 
+
 You now have a `.js` file that does the same job deterministically:
- 
+
 ```bash
 ./lightpanda agent hn-top-story.js
 ```
- 
+
 No LLM call, no API key needed at replay time, sub-second to run.
- 
+
 ### Other ways to launch
- 
+
 ```bash
 # Force a specific provider, ignoring auto-detection
 ./lightpanda agent --provider anthropic
- 
+
 # Slash-commands-only REPL, no LLM at all
 ./lightpanda agent --no-llm
- 
+
 # One-shot: ask a question, print the answer to stdout, exit
 ./lightpanda agent --task "what is on the front page of hn?"
 ```
- 
+
 ## Tips for getting useful saved scripts
- 
+
 - **Ask for the data you want, not the conversation about it.** "Get me the
   top 5 HN story titles and points, print them to stdout" beats "show me
   what's on Hacker News today." The synthesizer captures the data you asked
@@ -103,23 +103,27 @@ No LLM call, no API key needed at replay time, sub-second to run.
   genuinely changed.
 
 ## Providers and API keys
- 
+
 The agent needs an LLM to interpret natural language. Set the relevant API
 key as an environment variable, or pass `--provider` explicitly.
- 
+
 | Provider  | Flag                   | API key env                          |
 |-----------|------------------------|--------------------------------------|
 | Anthropic | `--provider anthropic` | `ANTHROPIC_API_KEY`                  |
 | OpenAI    | `--provider openai`    | `OPENAI_API_KEY`                     |
 | Gemini    | `--provider gemini`    | `GOOGLE_API_KEY` or `GEMINI_API_KEY` |
 | Ollama    | `--provider ollama`    | none (local)                         |
- 
+
 `--model` falls back to a sensible per-provider default. `--base-url`
 overrides the API endpoint (Ollama defaults to `http://localhost:11434/v1`).
 `--list-models` prints what the resolved provider offers and exits.
- 
+`--system-prompt` swaps in your own system prompt. `--verbosity
+<low|medium|high>` tunes how much progress detail goes to stderr (`--task`
+defaults to `low`, escalating to `high` when stderr is piped or redirected
+so harnesses capture the full `[tool/result]` trace).
+
 Without `--provider`, the agent picks one in this order:
- 
+
 1. **Remembered** - whatever you last selected with `/provider` or `/model`,
    persisted per-directory in `.lp-agent.zon`, as long as its key is still
    set.
@@ -133,12 +137,13 @@ Without `--provider`, the agent picks one in this order:
 4. **No provider at all** - falls back to the basic REPL (slash commands
    only). Natural language and LLM-driven commands (`/login`, `/logout`,
    `/acceptCookies`) will reject.
+
 `--no-llm` is the explicit override. It forces the basic REPL even when an
 API key is present. Useful for testing slash commands without burning
 tokens.
- 
+
 ### Reasoning effort
- 
+
 `--effort <none|minimal|low|medium|high|xhigh>` sets the per-turn reasoning
 budget for thinking models. It maps to each provider's native
 reasoning-effort knob and is ignored by non-thinking models. The REPL
@@ -147,43 +152,44 @@ defaults to `low` so turns stay snappy. `--task` and script runs default to
 effort can mean fewer tool calls per task (the model plans better), so it's
 a real tradeoff rather than a pure slowdown. Change it live in the REPL
 with `/effort`; selection persists in `.lp-agent.zon`.
- 
+
 ## Slash commands
- 
+
 The REPL uses a small slash-command language for browser actions. Each line
 is either a slash command, a `#` comment, a blank line, or (when an LLM is
 configured) a natural-language prompt.
- 
+
 Slash commands accept:
- 
+
 - A single positional value, when the tool has exactly one required field.
   `/goto 'https://example.com'`, `/extract '{"karma":"#karma"}'`.
 - `key=value` pairs. Values may be bare or quoted; strings with whitespace
   must be quoted. `/fill selector='#email' value='user@x.com'`.
 - A raw `{json}` blob, handed straight to the tool.
   `/findElement {"role":"button"}`.
+
 Tools whose selector is optional (`/click`, `/hover`, `/findElement`) take
 no positional and must use `key=value` form: `/click selector='a.login'`.
- 
+
 Quoting is content-aware: `'…'`, `"…"`, and triple-quoted `'''…'''` /
 `"""…"""` for values that mix quote styles or span multiple lines.
- 
+
 ### LLM-driven helpers
- 
+
 Three slash commands trigger an LLM turn rather than a direct tool call:
- 
+
 | Command          | What it does                                          |
 |------------------|-------------------------------------------------------|
 | `/login`         | Fills credentials from `$LP_*` env vars.              |
 | `/logout`        | Finds the logout control and signs out.               |
 | `/acceptCookies` | Dismisses the consent banner.                         |
- 
+
 All three require an LLM. `--no-llm` rejects them.
- 
+
 ### Meta commands
- 
+
 These don't drive the browser, they control the REPL itself:
- 
+
 | Command                  | What it does                                       |
 |--------------------------|----------------------------------------------------|
 | `/help`                  | Lists tools. `/help <tool>` prints the JSON schema.|
@@ -194,12 +200,14 @@ These don't drive the browser, they control the REPL itself:
 | `/usage`                 | Prints cumulative token usage and cache hit rate.  |
 | `/save [file.js]`        | Writes the session to a script.                    |
 | `/load <path>`           | Runs a script from disk against the current session.|
+| `/clear`                 | Forgets the conversation (history, usage, recorded actions, node IDs); keeps the page and cookies.|
+| `/reset`                 | Full reset: everything `/clear` does, plus a fresh browser session, dropping the page, cookies, and storage.|
 | `/quit`                  | Exits the REPL.                                    |
- 
+
 Meta commands are never recorded.
- 
+
 ## REPL features
- 
+
 - **Status bar.** A line under the prompt shows the active model and quick
   hints. In `--no-llm` it reads "basic REPL — slash commands only." It drops
   the least-important segments first when the terminal is narrow.
@@ -217,8 +225,9 @@ Meta commands are never recorded.
   commands (`/extract`, `/evaluate`, `/markdown`, `/tree`, ...) write to
   stdout. Tool calls, progress, and errors go to stderr. So
   `lightpanda agent --task ... > out.txt` captures a clean answer.
+
 ## Example slash-command session
- 
+
 ```console
 # Log into the demo and grab the dashboard title and visible cards.
 # Site-scoped vars (LP_<SITE>_<FIELD>) avoid collisions when you have
@@ -231,11 +240,11 @@ Meta commands are never recorded.
 /waitForSelector '.dashboard'
 /extract '{"title": ".dashboard h1", "cards": [".dashboard .card .name"]}'
 ```
- 
+
 `/extract` takes a JSON schema where each value tells the extractor what to
 lift off the page. The result is printed to stdout as a single JSON object.
 Supported value forms:
- 
+
 - `"<sel>"` — `textContent.trim()` of the first match.
 - `""` — the matched element's own text (only inside a `fields` block).
 - `["<sel>"]` — text of every match. Sugar for `[{"selector": "<sel>"}]`.
@@ -243,19 +252,20 @@ Supported value forms:
 - `[{"selector": "<sel>", "fields": {…}}]` — array of records, each
   `fields` value resolved relative to the matched element.
 - Add `"limit": N` inside any array's object spec to cap matches.
+
 The schema is parsed in Zig before the page-side walker runs, so malformed
 schemas are rejected up front with a plain `Error: InvalidParams` rather
 than a V8 stack trace.
- 
+
 ## Cross-call state with `lp.*`
- 
+
 `/extract` and `/evaluate` each return one value per call. To carry data
 between calls (capture a list on one page, walk it across navigations) use
 the `save=` modifier:
- 
+
 ```console
 /goto 'https://news.ycombinator.com/'
- 
+
 /extract save=front '''
 {
   "stories": [{
@@ -268,38 +278,38 @@ the `save=` modifier:
   }]
 }
 '''
- 
+
 /evaluate '''
 console.log(lp.front.stories[0].title);
 '''
 ```
- 
+
 `save=<name>` stashes the result keyed by `<name>` in a session-scoped
 store instead of printing it. Every subsequent `/evaluate` sees it as
 `globalThis.lp.<name>`.
- 
+
 Any mutation of `lp.*` inside `/evaluate` is persisted at the end of the
 call. Adding (`lp.x = …`), updating
 (`lp.front.stories[0].comments = […]`), or deleting (`delete lp.x`) all
 propagate. The next `/evaluate` sees the update, even after a navigation,
 because the store lives session-side, not on the page.
- 
+
 The store is **script-run scoped**: bound to the session that runs the
 script, gone when that session ends. There's no cross-session persistence;
 if you need that, use `localStorage` (origin-scoped, persists within a
 session).
- 
+
 ## JavaScript scripts
- 
+
 `./lightpanda agent script.js` runs a script without any LLM call. Scripts
 are plain synchronous JavaScript plus the installed Lightpanda primitives:
- 
+
 ```js
 goto("https://example.com");
 click({ selector: "a.login" });
 evaluate("document.title");
 ```
- 
+
 The script runs in an agent-only V8 context. No `window`, no `document`, no
 DOM APIs at the top level — browser interaction happens through the
 primitives only. The primitives are **synchronous and blocking**, each
@@ -307,29 +317,30 @@ returns its result directly, so write `const data = extract(…)`, not
 `await extract(…)`. There's no Promise contract around them.
 (`evaluate(...)` can run async JS inside the page, but the `evaluate(...)`
 call itself still returns synchronously.)
- 
+
 It's not Node.js. There's no `require`, `process`, `fs`, npm package
 loading, or Node standard library. The `evaluate(...)` primitive runs its
 string in the current page context; page scripts can't see agent variables
 or agent primitives.
- 
+
 The last expression in the script is printed automatically, so a script
 that ends with `extract({...})` will print the extraction result to stdout.
 Tool errors throw JavaScript exceptions and stop execution.
- 
+
 See [agent-script.md](agent-script.md) for the full script format reference.
- 
+
 ### Saving and loading
- 
+
 `/save [file.js]` writes the current session to a `.js` file. `/load <path>`
 runs a script from disk against the current session.
- 
+
 `/save` works one of two ways:
- 
+
 - **With `--no-llm`** it transcribes the session deterministically.
   State-mutating commands (`/goto`, `/click`, `/fill`, `/scroll`, `/hover`,
   `/selectOption`, `/setChecked`, `/waitForSelector`, `/waitForScript`,
-  `/press`, `/evaluate`, `/extract`) become JavaScript calls. Read-only
+  `/waitForState`, `/press`, `/evaluate`, `/extract`) become JavaScript
+  calls. Read-only
   commands (`/tree`, `/markdown`, `/links`, `/findElement`, ...) are
   dropped. Each natural-language prompt that produced recorded actions is
   written as a `// <prompt>` comment above its calls so the script stays
@@ -339,30 +350,31 @@ runs a script from disk against the current session.
   so the result generally has no such comments: the model folds intent
   into the code and drops dead-ends. Returned data is the last expression,
   which prints automatically on replay.
+
 ## One-shot mode (`--task`)
- 
+
 ```console
 ./lightpanda agent --provider gemini \
   --task "what is the top story on news.ycombinator.com?"
 ```
- 
+
 `--task` runs a single user turn, prints the final answer to stdout, and
 exits. Combine with `-a <path>` / `--attach <path>` (repeatable) to feed
 local files to providers that accept attachments. Text files are inlined
 into the prompt (max 512 KiB each); binary files (image, audio, pdf) are
 base64-encoded inline (max 20 MiB each). Unsupported MIME types fail before
 any browser work runs.
- 
+
 `--task` conflicts with the positional script argument.
- 
+
 ## Driving Lightpanda from an external LLM agent
- 
+
 When the calling agent already has its own LLM (e.g. Claude Code), use
 `lightpanda mcp` rather than `lightpanda agent`. The MCP server exposes the
 same browser tools listed below, so the external agent does the planning
 while Lightpanda only drives the browser. No `--provider` or API key is
 required on the Lightpanda side.
- 
+
 ```json
 {
   "mcpServers": {
@@ -373,50 +385,51 @@ required on the Lightpanda side.
   }
 }
 ```
- 
+
 Tool names are camelCase and case-sensitive. MCP clients must call the
 canonical tags (`goto`, `evaluate`, `tree`, `save`, ...).
- 
+
 For sub-task delegation in the other direction (calling Lightpanda's own
 LLM-driven agent in a one-shot fashion), use `--task` on stdin.
- 
+
 ### Saving a script over MCP
- 
+
 `lightpanda mcp` exposes a `save` tool so an external agent can persist the
 session as a `.js` script for later deterministic replay. Unlike the
 standalone agent's `/save`, the MCP server has no LLM of its own, so the
 calling client holds the conversation and synthesizes the script itself.
- 
+
 | Tool   | Args                               | Effect                                                                                                                |
 |--------|------------------------------------|-----------------------------------------------------------------------------------------------------------------------|
 | `save` | `{ path: string, script: string }` | Write `script` to `path` (relative, no `..`; created or overwritten) and return the absolute location and line count. |
- 
+
 The tool's description carries the same synthesis guidance the agent's
 `/save` gives its LLM: prefer the builtins (`goto`, `click`, `fill`,
 `extract`, ...) as JavaScript calls, drop dead-ends, keep `$LP_*`
 placeholders. Any literal `LP_*` value is scrubbed back to its placeholder
 before the file is written. The result runs without an LLM via
 `./lightpanda agent session.js`.
- 
+
 ## Browser tools
- 
+
 The agent and MCP server share the tool set defined in
 `src/browser/tools.zig`. Highlights:
- 
+
 - `goto`, `search` (Tavily when `TAVILY_API_KEY` is set, DuckDuckGo
   otherwise)
 - `tree`, `markdown`, `html`, `links`, `interactiveElements`,
   `structuredData`, `detectForms`, `nodeDetails`, `findElement`
 - `click`, `fill`, `hover`, `press`, `scroll`, `selectOption`, `setChecked`,
-  `waitForSelector`, `waitForScript`
+  `waitForSelector`, `waitForScript`, `waitForState`
 - `extract` (the schema-driven data tool), `evaluate`, `consoleLogs`,
   `getUrl`, `getCookies`, `getEnv`
+
 Selectors prefer CSS over `backendNodeId` for the click-family tools since
 node IDs are invalidated by any DOM mutation. The system prompt enforces
 this for the LLM.
- 
+
 ## Security notes
- 
+
 - The agent treats page content as untrusted data, not instructions. URLs
   surfaced by a page are not followed unless they match the user's task.
 - `$LP_*` environment variable references in `/fill` values are resolved