Files
browser/docs/agent-tutorial.md
Adrià Arrufat ac82551e66 agent: replace -i flag with /save and /load commands
Removes the `-i`/`--interactive` CLI flag and live file-based
recording. Instead, the REPL now supports a `/load <path>` command
to run scripts from disk, and `/save` to export the in-memory
session recording.

The `Recorder` is simplified to be purely in-memory, and the script
runtime is moved to `src/script/Runtime.zig`.

BREAKING CHANGE: The `-i`/`--interactive` flag has been removed. Use
the `/save` and `/load` commands within the REPL instead.
2026-06-03 16:21:10 +02:00

462 lines
15 KiB
Markdown

# Agent tutorial — Hacker News, end-to-end
This walks you from "I just built `./lightpanda`" to a recorded,
replayable JavaScript browser script, then captures the same flow from
an external MCP client. Every section ends with a command you can run;
nothing references later sections.
For the flag/command/tool tables, see [agent.md](agent.md). This
document is the tutorial; that one is the reference. For the JavaScript
runtime contract, see [agent-script.md](agent-script.md).
## What you'll build
One session against Hacker News:
1. Log in with your account.
2. Confirm the login by reading the username out of the header.
3. Save the whole flow to a `.js` file.
4. Run it offline, with no LLM.
5. Add local JavaScript logic around `extract(...)` results.
6. Save the same flow as a script from an external agent over MCP.
## Prerequisites
- `./lightpanda` on your PATH (build with `zig build`).
- A Hacker News account.
- One LLM API key for sections that use natural language — Anthropic,
OpenAI, Gemini, or a local Ollama. Recorded `.js` scripts run with no
key at all.
Export your HN credentials as `LP_*` env vars. The convention is
`LP_<SITE>_<FIELD>` — a short site identifier (`HN` for Hacker News,
`GH` for GitHub, …) lets you keep credentials for multiple sites in
your environment without collisions. The unprefixed `LP_USERNAME` /
`LP_PASSWORD` form is the generic fallback when you only have one
site.
In **bash** or **zsh**:
```console
export LP_HN_USERNAME="your-hn-handle"
export LP_HN_PASSWORD="your-hn-password"
```
In **fish**:
```fish
set -gx LP_HN_USERNAME "your-hn-handle"
set -gx LP_HN_PASSWORD "your-hn-password"
```
The `LP_` prefix matters. The agent resolves `$LP_*` references
*inside* the Lightpanda subprocess, so the literal secret never enters
the LLM context; and the `getEnv` tool refuses to read anything that
doesn't start with `LP_`, so the model can't probe your other env
vars.
Verify they're set before continuing — substitution fails silently if
a variable is missing (the literal `$LP_HN_USERNAME` ends up typed
into the form), and the `/fill` confirmation message intentionally
echoes the placeholder name rather than the resolved value, so the
response text won't tell you. Confirm directly:
```console
./lightpanda agent --no-llm
> /getEnv LP_HN_USERNAME
```
`/getEnv` returns the literal value if set, or "not set" if missing.
Only `$LP_*` references in fill values are substituted; other `$`
characters in your password (`my$ecret`, `$5.99`) are passed through
verbatim.
## 1. First contact: the REPL
```console
./lightpanda agent
```
On startup the agent prints a one-line notice telling you which mode it
landed in — which provider (and which env var won), or "basic REPL
(no LLM)" if no key is set. The REPL writes its history to
`.lp-history` in the working directory, so up-arrow works across runs.
Try the meta commands:
```
> /help
> /help goto
> /quit
```
`/help` lists every browser tool. `/help <tool>` prints its JSON
schema. `/quit` exits cleanly. If you have no API key yet and want to
poke around without an LLM, `./lightpanda agent --no-llm` forces the
basic REPL.
## 2. The shortest possible win: `--task`
Before doing anything complicated, prove the LLM + browser stack
works end-to-end:
```console
./lightpanda agent --task "what is the top story on news.ycombinator.com?"
```
`--task` runs a single user turn, prints the final answer on stdout,
and exits. Tool calls, progress, and errors all go to stderr, so
redirecting stdout gives you a clean answer:
```console
./lightpanda agent --task "top story on news.ycombinator.com?" > out.txt
```
If you need to feed the model a local file, repeat
`-a <path>` (or `--attach <path>`) for each one.
## 3. Driving the browser by hand
Now back to the REPL. We'll write the HN login flow one command at a
time so you can see how each step depends on what the previous one
showed.
```
> /goto https://news.ycombinator.com/login
```
`/goto` takes a single URL argument (positional, optionally quoted). The page
is now loaded.
> The REPL scripting surface is slash commands. `click '#foo'` (no leading slash) is
> forwarded to the LLM as a natural-language prompt; only `/click '#foo'`
> runs as a command. TAB completion in the REPL helps you find the right
> tool name.
Inspect it before clicking anything:
```
> /tree
```
`/tree` prints the semantic tree to stdout. Two forms are visible —
the login form and the create-account form below it — and each one
contains two unlabeled textboxes:
```
8 form
13 'username:'
15 [i] textbox
18 'password:'
20 [i]
22 [i] button 'login' value='login'
30 form
35 'username:'
37 [i] textbox
```
Notice the textboxes have no accessible name — "username:" is a
sibling text node, not a `<label for="…">`. This is typical of older
pages. The ARIA-name tool reflects that:
```
> /findElement role=textbox name=username
[]
```
Empty result. Slash commands accept a single positional argument
(for tools with one required field), `key=value` pairs, or a raw
`{json}` blob; `findElement` filters by accessible name, which we
don't have here.
For unlabeled login forms, jump to `detectForms`, which reads the
HTML directly and surfaces each form's `action` plus each input's
`name` attribute:
```
> /detectForms
```
You'll see two forms, the first with `action: "login"` and fields
named `acct` and `pw`. That's enough to synthesize the CSS selector
yourself: scope by form action to avoid colliding with the
create-account form, then key on the input's `name` attribute.
**Selector rule, load-bearing:** the click-family tools (`/click`,
`/fill`, `/hover`, `/selectOption`, `/setChecked`) accept CSS selectors
only. The backend node IDs `findElement` and `detectForms` return are
invalidated by any DOM mutation, and they cannot be serialized into
durable JavaScript recordings — a session that uses them is not
replayable. Always
synthesize a CSS selector from the attributes (`id`, `class`,
`name`, `action`, `tag_name`) and use that.
Now fill the form:
```
> /fill selector='form[action="login"] input[name="acct"]' value='$LP_HN_USERNAME'
> /fill selector='form[action="login"] input[name="pw"]' value='$LP_HN_PASSWORD'
> /click selector='form[action="login"] input[type="submit"][value="login"]'
> /waitForSelector '#logout'
```
`$LP_*` references in `/fill` values are resolved at execution time
inside the subprocess. The LLM never sees the literal credential.
The `/waitForSelector '#logout'` line is doing two jobs at once, and it's
worth unpacking because the pattern recurs in every recorded script:
- **It's a synchronization point.** `/click` on the submit button
returns as soon as the click dispatches, not after the server
responds. Without a wait, the next command races the login redirect
and may run against the *pre-login* DOM. `/waitForSelector '<sel>'`
blocks until that selector appears, so the script resumes only after
HN's logged-in page has rendered.
- **It's an implicit assertion.** HN renders the `#logout` link only
when the session is authenticated. If the credentials are wrong (or
HN throws up a captcha, or rate-limits, or the form layout
changed), `#logout` never appears and `/waitForSelector` times out —
the script fails loudly at the line where the failure actually
happened, instead of silently succeeding and producing garbage
downstream.
You generally want a `/waitForSelector` like this after every state-changing
action that triggers async work: pick a selector that *only* exists
in the post-action state, and you get free regression protection.
Waiting on the URL (`location.pathname === "/news"`) or a generic
element that exists on both pages is weaker — both can be true before
the navigation finishes.
Confirm by pulling structured data off the page. `/extract` takes a
JSON schema object — each value describes what to lift out, and the
whole result is printed to stdout as one JSON object. The simplest
form is a flat selector lookup:
```
> /extract '{"karma": "#karma"}'
{"karma":"42"}
```
The schema grammar is small but covers the cases you'd reach for:
- `"<sel>"``textContent.trim()` of the first match (string, or
`null` if no match).
- `""` — the matched element's own text (only meaningful inside a
`fields` block, where there's an outer element to refer to).
- `["<sel>"]` — text of every match (string array).
- `{"selector": "<sel>", "attr": "<name>"}` — the first match's
attribute value.
- `[{"selector": "<sel>", "fields": {…}}]` — array of objects, where
each `fields` entry is resolved relative to the matched element.
After a page-changing action (click, navigation, form submit) the
previous `/tree` snapshot is stale; re-inspect before the next
interaction. Hop back to the front page and pull the story list to
exercise the structured form:
```
> /goto https://news.ycombinator.com
> /extract '''
{
"topStories": [{
"selector": ".athing",
"fields": {
"rank": ".rank",
"title": ".titleline > a",
"url": {"selector": ".titleline > a", "attr": "href"}
}
}]
}
'''
```
Triple-quoted (`'''` or `"""`) values let a schema span multiple lines
— the REPL keeps reading until it sees the matching closing quote.
The result is a single JSON object printed to stdout:
`{"topStories":[{"rank":"1","title":"…","url":"…"}, …]}`.
The schema is parsed in Zig before the page-side walker runs, so a
typo like a stray comma surfaces here as `Error: invalid /extract
schema JSON` instead of a confusing V8 stack trace.
## 4. Saving the session
The same flow, but exported to a file. In the same REPL, retype the
sequence — login (`/goto`, two `/fill`s, `/click`, `/waitForSelector`),
then the front-page hop and structured pull (`/goto`, multi-line
`/extract`) — then save it:
```
> /save hn_login.js
```
In the basic REPL (`--no-llm`) `/save` transcribes the session
deterministically; with an LLM it synthesizes an equivalent idiomatic
script. Either way `/quit` when you're done.
Inspect the result:
```console
cat hn_login.js
```
You should see the seven mutating commands and nothing else — no
`/tree`, no `/markdown`, no read-only lookups. `/save` filters on a
per-tool flag (`ToolDef.recorded`) so read-only inspection never
pollutes the script; `/extract` *is* recorded (it changes what the
script can read on replay even though it doesn't mutate the page).
The saved file is JavaScript:
```js
goto("https://news.ycombinator.com/login");
fill({ selector: "form[action=\"login\"] input[name=\"acct\"]", value: "$LP_HN_USERNAME" });
fill({ selector: "form[action=\"login\"] input[name=\"pw\"]", value: "$LP_HN_PASSWORD" });
click({ selector: "form[action=\"login\"] input[type=\"submit\"][value=\"login\"]" });
waitForSelector("#logout");
goto("https://news.ycombinator.com");
extract({ topStories: [{ selector: ".athing", fields: { rank: ".rank", title: ".titleline > a", url: { selector: ".titleline > a", attr: "href" } } }] });
```
Natural-language REPL turns are not saved as executable JavaScript.
When a natural-language turn produces recorded browser actions, the
prompt is kept as a `//` comment above those actions.
## 5. Running deterministically
```console
./lightpanda agent hn_login.js
```
No `--provider`, no LLM, no token spend. The saved script runs top
to bottom against a fresh browser. This is the form you want for
regression tests and CI. From inside the REPL, `/load hn_login.js`
runs the same script against the current session.
A JavaScript script does not print its final expression automatically.
The recorded `extract(...)` call returns a local JavaScript value, so
edit the final line when you want replay output:
```js
const topStories = extract({
topStories: [{
selector: ".athing",
fields: {
rank: ".rank",
title: ".titleline > a",
url: { selector: ".titleline > a", attr: "href" }
}
}]
});
console.log(JSON.stringify({ topStories }, null, 2));
```
Run it again and stdout is clean JSON:
```console
./lightpanda agent hn_login.js > stories.json
```
`/login` and `/acceptCookies` are REPL-only LLM triggers. A script
saved with `/save` never contains them; `/save` captures the resulting
browser tool calls instead. Lines that are neither slash commands nor
comments are also REPL-only conveniences, not script syntax.
## 6. Local JavaScript logic
Agent scripts run in a separate JavaScript context from the web page.
There is no `window`, `document`, DOM API, `require`, `process`, or
Node standard library in that context. Browser interaction happens
through the installed primitives, and `extract(...)` is the usual way
to move page data into local script logic.
For example, turn the extraction result into a smaller report without
running any page-side JavaScript:
```js
goto("https://news.ycombinator.com");
const topStories = extract({
topStories: [{
selector: ".athing",
limit: 5,
fields: {
rank: ".rank",
title: ".titleline > a",
url: { selector: ".titleline > a", attr: "href" }
}
}]
});
const report = topStories.map((story) => ({
rank: story.rank,
title: story.title,
url: story.url
}));
console.log(JSON.stringify({ report }, null, 2));
```
Use the `eval(...)` primitive only when you intentionally want to run a
string in the current page's JavaScript context. Page eval cannot see
agent variables or call `goto`, `extract`, and the other agent
primitives.
## 7. Same flow, external agent (MCP)
Everything above used Lightpanda's built-in agent. If you're driving
Lightpanda from a different agent (Claude Code, a custom MCP client,
your own harness), use `lightpanda mcp` instead — same browser tools,
no API key on the Lightpanda side, the calling agent supplies the
LLM.
Register the server with your MCP client:
```json
{
"mcpServers": {
"lightpanda": {
"command": "/path/to/lightpanda",
"args": ["mcp"]
}
}
}
```
### Save a script over MCP
The MCP server has no LLM of its own — your external agent is the brain.
Drive the browser with the usual tools, then hand back a script with the
`save` tool:
1. Run the browser tools you'd call anyway: `goto`, `fill`, `click`,
`waitForSelector`. The server resolves `$LP_*` placeholders inside the
subprocess, so credentials never reach your agent's context.
2. When the task is done, synthesize a `.js` script from the steps that
mattered — call the builtins as JavaScript functions with the same
object arguments — and call `save { "path": "hn_login.js", "script":
"goto(\"...\");\n..." }`. The path must be relative and free of `..`;
the response reports the absolute location and line count.
The `save` tool's description carries the same guidance the REPL's
`/save` gives its LLM (prefer builtins, drop dead-ends, keep `$LP_*`
placeholders), and any literal `LP_*` value is scrubbed back to its
placeholder before the file is written. The output uses the same
JavaScript format as `/save hn_login.js` from section 4 and runs
unmodified:
```console
./lightpanda agent hn_login.js
```
## Where to go next
- [agent.md](agent.md) — full reference: every flag, every slash
command, every browser tool, plus the security model and
auto-detection rules.
- [agent-script.md](agent-script.md) — JavaScript runtime, primitives,
return values, and complete script examples.
- `lightpanda mcp --help` and `lightpanda agent --help` — current
flag listings straight from the binary.