Files
browser/docs/agent-tutorial.md
Francis Bouvier 8824174afa agent: run recorded scripts as JavaScript
Replace recorded agent replay with a standalone JavaScript script runtime.
Install synchronous agent primitives in an isolated V8 context, add console
output, return structured extract values as JS objects/arrays, and route script
execution through the new runtime.

Update recording to emit .js function calls, default /save filenames to .js,
drop script-level extract save support, and refresh agent docs/tutorials for
the new format.

Self-healing is disabled for now.
2026-06-01 15:43:37 +02:00

488 lines
16 KiB
Markdown

# Agent tutorial — Hacker News, end-to-end
This walks you from "I just built `./lightpanda`" to a recorded,
replayable JavaScript browser script, then captures the same flow from
an external MCP client. Every section ends with a command you can run;
nothing references later sections.
For the flag/command/tool tables, see [agent.md](agent.md). This
document is the tutorial; that one is the reference. For the JavaScript
runtime contract, see [agent-script.md](agent-script.md).
## What you'll build
One session against Hacker News:
1. Log in with your account.
2. Confirm the login by reading the username out of the header.
3. Record the whole flow to a `.js` file.
4. Run it offline, with no LLM.
5. Add local JavaScript logic around `extract(...)` results.
6. Record the same browser actions from an external agent over MCP.
## Prerequisites
- `./lightpanda` on your PATH (build with `zig build`).
- A Hacker News account.
- One LLM API key for sections that use natural language — Anthropic,
OpenAI, Gemini, or a local Ollama. Recorded `.js` scripts run with no
key at all.
Export your HN credentials as `LP_*` env vars. The convention is
`LP_<SITE>_<FIELD>` — a short site identifier (`HN` for Hacker News,
`GH` for GitHub, …) lets you keep credentials for multiple sites in
your environment without collisions. The unprefixed `LP_USERNAME` /
`LP_PASSWORD` form is the generic fallback when you only have one
site.
In **bash** or **zsh**:
```console
export LP_HN_USERNAME="your-hn-handle"
export LP_HN_PASSWORD="your-hn-password"
```
In **fish**:
```fish
set -gx LP_HN_USERNAME "your-hn-handle"
set -gx LP_HN_PASSWORD "your-hn-password"
```
The `LP_` prefix matters. The agent resolves `$LP_*` references
*inside* the Lightpanda subprocess, so the literal secret never enters
the LLM context; and the `getEnv` tool refuses to read anything that
doesn't start with `LP_`, so the model can't probe your other env
vars.
Verify they're set before continuing — substitution fails silently if
a variable is missing (the literal `$LP_HN_USERNAME` ends up typed
into the form), and the `/fill` confirmation message intentionally
echoes the placeholder name rather than the resolved value, so the
response text won't tell you. Confirm directly:
```console
./lightpanda agent --no-llm
> /getEnv LP_HN_USERNAME
```
`/getEnv` returns the literal value if set, or "not set" if missing.
Only `$LP_*` references in fill values are substituted; other `$`
characters in your password (`my$ecret`, `$5.99`) are passed through
verbatim.
## 1. First contact: the REPL
```console
./lightpanda agent
```
On startup the agent prints a one-line notice telling you which mode it
landed in — which provider (and which env var won), or "basic REPL
(no LLM)" if no key is set. The REPL writes its history to
`.lp-history` in the working directory, so up-arrow works across runs.
Try the meta commands:
```
> /help
> /help goto
> /quit
```
`/help` lists every browser tool. `/help <tool>` prints its JSON
schema. `/quit` exits cleanly. If you have no API key yet and want to
poke around without an LLM, `./lightpanda agent --no-llm` forces the
basic REPL.
## 2. The shortest possible win: `--task`
Before doing anything complicated, prove the LLM + browser stack
works end-to-end:
```console
./lightpanda agent --task "what is the top story on news.ycombinator.com?"
```
`--task` runs a single user turn, prints the final answer on stdout,
and exits. Tool calls, progress, and errors all go to stderr, so
redirecting stdout gives you a clean answer:
```console
./lightpanda agent --task "top story on news.ycombinator.com?" > out.txt
```
If you need to feed the model a local file, repeat
`-a <path>` (or `--attach <path>`) for each one.
## 3. Driving the browser by hand
Now back to the REPL. We'll write the HN login flow one command at a
time so you can see how each step depends on what the previous one
showed.
```
> /goto https://news.ycombinator.com/login
```
`/goto` takes a single URL argument (positional, optionally quoted). The page
is now loaded.
> The REPL scripting surface is slash commands. `click '#foo'` (no leading slash) is
> forwarded to the LLM as a natural-language prompt; only `/click '#foo'`
> runs as a command. TAB completion in the REPL helps you find the right
> tool name.
Inspect it before clicking anything:
```
> /tree
```
`/tree` prints the semantic tree to stdout. Two forms are visible —
the login form and the create-account form below it — and each one
contains two unlabeled textboxes:
```
8 form
13 'username:'
15 [i] textbox
18 'password:'
20 [i]
22 [i] button 'login' value='login'
30 form
35 'username:'
37 [i] textbox
```
Notice the textboxes have no accessible name — "username:" is a
sibling text node, not a `<label for="…">`. This is typical of older
pages. The ARIA-name tool reflects that:
```
> /findElement role=textbox name=username
[]
```
Empty result. Slash commands accept a single positional argument
(for tools with one required field), `key=value` pairs, or a raw
`{json}` blob; `findElement` filters by accessible name, which we
don't have here.
For unlabeled login forms, jump to `detectForms`, which reads the
HTML directly and surfaces each form's `action` plus each input's
`name` attribute:
```
> /detectForms
```
You'll see two forms, the first with `action: "login"` and fields
named `acct` and `pw`. That's enough to synthesize the CSS selector
yourself: scope by form action to avoid colliding with the
create-account form, then key on the input's `name` attribute.
**Selector rule, load-bearing:** the click-family tools (`/click`,
`/fill`, `/hover`, `/selectOption`, `/setChecked`) accept CSS selectors
only. The backend node IDs `findElement` and `detectForms` return are
invalidated by any DOM mutation, and they cannot be serialized into
durable JavaScript recordings — a session that uses them is not
replayable. Always
synthesize a CSS selector from the attributes (`id`, `class`,
`name`, `action`, `tag_name`) and use that.
Now fill the form:
```
> /fill selector='form[action="login"] input[name="acct"]' value='$LP_HN_USERNAME'
> /fill selector='form[action="login"] input[name="pw"]' value='$LP_HN_PASSWORD'
> /click selector='form[action="login"] input[type="submit"][value="login"]'
> /waitForSelector '#logout'
```
`$LP_*` references in `/fill` values are resolved at execution time
inside the subprocess. The LLM never sees the literal credential.
The `/waitForSelector '#logout'` line is doing two jobs at once, and it's
worth unpacking because the pattern recurs in every recorded script:
- **It's a synchronization point.** `/click` on the submit button
returns as soon as the click dispatches, not after the server
responds. Without a wait, the next command races the login redirect
and may run against the *pre-login* DOM. `/waitForSelector '<sel>'`
blocks until that selector appears, so the script resumes only after
HN's logged-in page has rendered.
- **It's an implicit assertion.** HN renders the `#logout` link only
when the session is authenticated. If the credentials are wrong (or
HN throws up a captcha, or rate-limits, or the form layout
changed), `#logout` never appears and `/waitForSelector` times out —
the script fails loudly at the line where the failure actually
happened, instead of silently succeeding and producing garbage
downstream.
You generally want a `/waitForSelector` like this after every state-changing
action that triggers async work: pick a selector that *only* exists
in the post-action state, and you get free regression protection.
Waiting on the URL (`location.pathname === "/news"`) or a generic
element that exists on both pages is weaker — both can be true before
the navigation finishes.
Confirm by pulling structured data off the page. `/extract` takes a
JSON schema object — each value describes what to lift out, and the
whole result is printed to stdout as one JSON object. The simplest
form is a flat selector lookup:
```
> /extract '{"karma": "#karma"}'
{"karma":"42"}
```
The schema grammar is small but covers the cases you'd reach for:
- `"<sel>"``textContent.trim()` of the first match (string, or
`null` if no match).
- `""` — the matched element's own text (only meaningful inside a
`fields` block, where there's an outer element to refer to).
- `["<sel>"]` — text of every match (string array).
- `{"selector": "<sel>", "attr": "<name>"}` — the first match's
attribute value.
- `[{"selector": "<sel>", "fields": {…}}]` — array of objects, where
each `fields` entry is resolved relative to the matched element.
After a page-changing action (click, navigation, form submit) the
previous `/tree` snapshot is stale; re-inspect before the next
interaction. Hop back to the front page and pull the story list to
exercise the structured form:
```
> /goto https://news.ycombinator.com
> /extract '''
{
"topStories": [{
"selector": ".athing",
"fields": {
"rank": ".rank",
"title": ".titleline > a",
"url": {"selector": ".titleline > a", "attr": "href"}
}
}]
}
'''
```
Triple-quoted (`'''` or `"""`) values let a schema span multiple lines
— the REPL keeps reading until it sees the matching closing quote.
The result is a single JSON object printed to stdout:
`{"topStories":[{"rank":"1","title":"…","url":"…"}, …]}`.
The schema is parsed in Zig before the page-side walker runs, so a
typo like a stray comma surfaces here as `Error: invalid /extract
schema JSON` instead of a confusing V8 stack trace.
## 4. Recording the session
The same flow, but recorded to a file. Quit the REPL, then:
```console
./lightpanda agent -i hn_login.js
```
`-i <path>` opens an interactive REPL that appends state-mutating
commands to `<path>`. Retype the same sequence — login (`/goto`, two
`/fill`s, `/click`, `/waitForSelector`), then the front-page hop and
structured pull (`/goto`, multi-line `/extract`) — then `/quit`.
Inspect the result:
```console
cat hn_login.js
```
You should see the seven mutating commands and nothing else — no
`/tree`, no `/markdown`, no read-only lookups. The recorder filters on a
per-tool flag (`ToolDef.recorded`) so read-only inspection never
pollutes the script; `/extract` *is* recorded (it changes what the
script can read on replay even though it doesn't mutate the page).
The saved file is JavaScript:
```js
goto("https://news.ycombinator.com/login");
fill({ selector: "form[action=\"login\"] input[name=\"acct\"]", value: "$LP_HN_USERNAME" });
fill({ selector: "form[action=\"login\"] input[name=\"pw\"]", value: "$LP_HN_PASSWORD" });
click({ selector: "form[action=\"login\"] input[type=\"submit\"][value=\"login\"]" });
waitForSelector("#logout");
goto("https://news.ycombinator.com");
extract({ topStories: [{ selector: ".athing", fields: { rank: ".rank", title: ".titleline > a", url: { selector: ".titleline > a", attr: "href" } } }] });
```
Natural-language REPL turns are not saved as executable JavaScript.
When a natural-language turn produces recorded browser actions, the
prompt is kept as a `//` comment above those actions.
## 5. Running deterministically
```console
./lightpanda agent hn_login.js
```
No `--provider`, no LLM, no token spend. The recorded script runs top
to bottom against a fresh browser. This is the form you want for
regression tests and CI.
A JavaScript script does not print its final expression automatically.
The recorded `extract(...)` call returns a local JavaScript value, so
edit the final line when you want replay output:
```js
const topStories = extract({
topStories: [{
selector: ".athing",
fields: {
rank: ".rank",
title: ".titleline > a",
url: { selector: ".titleline > a", attr: "href" }
}
}]
});
console.log(JSON.stringify({ topStories }, null, 2));
```
Run it again and stdout is clean JSON:
```console
./lightpanda agent hn_login.js > stories.json
```
`/login` and `/acceptCookies` are REPL-only LLM triggers. A pure
recording from `-i` never contains them; the recorder captures the
resulting browser tool calls instead. Lines that are neither slash
commands nor comments are also REPL-only conveniences, not script
syntax.
## 6. Local JavaScript logic
Agent scripts run in a separate JavaScript context from the web page.
There is no `window`, `document`, DOM API, `require`, `process`, or
Node standard library in that context. Browser interaction happens
through the installed primitives, and `extract(...)` is the usual way
to move page data into local script logic.
For example, turn the extraction result into a smaller report without
running any page-side JavaScript:
```js
goto("https://news.ycombinator.com");
const topStories = extract({
topStories: [{
selector: ".athing",
limit: 5,
fields: {
rank: ".rank",
title: ".titleline > a",
url: { selector: ".titleline > a", attr: "href" }
}
}]
});
const report = topStories.map((story) => ({
rank: story.rank,
title: story.title,
url: story.url
}));
console.log(JSON.stringify({ report }, null, 2));
```
Use the `eval(...)` primitive only when you intentionally want to run a
string in the current page's JavaScript context. Page eval cannot see
agent variables or call `goto`, `extract`, and the other agent
primitives.
## 7. Same flow, external agent (MCP)
Everything above used Lightpanda's built-in agent. If you're driving
Lightpanda from a different agent (Claude Code, a custom MCP client,
your own harness), use `lightpanda mcp` instead — same browser tools,
no API key on the Lightpanda side, the calling agent supplies the
LLM.
Register the server with your MCP client:
```json
{
"mcpServers": {
"lightpanda": {
"command": "/path/to/lightpanda",
"args": ["mcp"]
}
}
}
```
### Record a session over MCP
From the external agent, call:
1. `recordStart { "path": "hn_login.js" }` — begins appending
state-mutating tool calls as JavaScript to `hn_login.js`. The path
must be relative and free of `..`.
2. The same browser tools you'd call anyway: `goto`, `fill`, `click`,
`waitForSelector`. Each one that succeeds is appended as a
JavaScript primitive call;
query-only tools (`tree`, `markdown`, `findElement`, `consoleLogs`)
are never recorded.
3. `recordComment { "text": "logged in" }` — drop a `//` breadcrumb above
the next recorded line. Useful for marking the boundary between
LLM-driven phases.
4. `recordStop {}` — closes the recording and returns
`{path, line_count}`.
The output file uses the same JavaScript format as `-i hn_login.js`
from section 4. It runs via the agent CLI without modification:
```console
./lightpanda agent hn_login.js
```
### About `scriptStep` and `scriptHeal`
`recordStart` now records JavaScript agent scripts. The MCP
`scriptStep` and `scriptHeal` tools are still available for external
agents that want to run and repair one slash-command line at a time,
but that is a separate PandaScript line-healing workflow:
1. Read the PandaScript file you are healing. For each non-blank,
non-comment line, call
`scriptStep { "line": "<line>" }`. Comments and blanks are no-ops
on the Lightpanda side.
2. On `isError: true`, the structured error message tells you what
failed. Hand the current page state and the failing line to your
own LLM; have it return a replacement PandaScript line (or
several).
3. Call `scriptHeal { "path": "...", "replacements":
[{ "original_line": "...", "replacement_lines": ["..."] }] }`.
Each `original_line` must match verbatim. Lightpanda writes
`<path>.bak` first, then atomically rewrites the file with the
`# [Auto-healed] Original: …` header prepended to the first
replacement.
4. Continue from the next line.
`scriptStep` deliberately does *not* auto-record: the script is
already the source of truth during replay, so double-recording would
diverge the file from itself. `/login`, `/acceptCookies`, and any line
that isn't a slash command are rejected — those need an LLM, which is
the caller's responsibility.
## Where to go next
- [agent.md](agent.md) — full reference: every flag, every slash
command, every browser tool, plus the security model and
auto-detection rules.
- [agent-script.md](agent-script.md) — JavaScript runtime, primitives,
return values, and complete script examples.
- `lightpanda mcp --help` and `lightpanda agent --help` — current
flag listings straight from the binary.