mirror of
https://github.com/lightpanda-io/browser.git
synced 2026-06-11 09:35:59 -04:00
agent: add end-to-end tutorial
This commit is contained in:
413
docs/agent-tutorial.md
Normal file
413
docs/agent-tutorial.md
Normal file
@@ -0,0 +1,413 @@
|
||||
# Agent tutorial — Hacker News, end-to-end
|
||||
|
||||
This walks you from "I just built `./lightpanda`" to a recorded,
|
||||
replayable, self-healing browser script — and then drives the same
|
||||
script from an external MCP client. Every section ends with a command
|
||||
you can run; nothing references later sections.
|
||||
|
||||
For the flag/command/tool tables, see [agent.md](agent.md). This
|
||||
document is the tutorial; that one is the reference.
|
||||
|
||||
## What you'll build
|
||||
|
||||
One session against Hacker News:
|
||||
|
||||
1. Log in with your account.
|
||||
2. Confirm the login by reading the username out of the header.
|
||||
3. Record the whole flow to a `.lp` file.
|
||||
4. Replay it offline, with no LLM.
|
||||
5. Break a selector on purpose; watch `--self-heal` repair the file.
|
||||
6. Drive the same script from an external agent over MCP.
|
||||
|
||||
The finished artifact already exists in the repo as
|
||||
[`hn_login.lp`](../hn_login.lp). Diff your recording against it at the
|
||||
end as a sanity check.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- `./lightpanda` on your PATH (build with `zig build`).
|
||||
- A Hacker News account.
|
||||
- One LLM API key for sections that need natural language and
|
||||
self-healing — Anthropic, OpenAI, Gemini, or a local Ollama. Sections
|
||||
4–7 work with no key at all.
|
||||
|
||||
Export your HN credentials as `LP_*` env vars. The convention is
|
||||
`LP_<SITE>_<FIELD>` — a short site identifier (`HN` for Hacker News,
|
||||
`GH` for GitHub, …) lets you keep credentials for multiple sites in
|
||||
your environment without collisions. The unprefixed `LP_USERNAME` /
|
||||
`LP_PASSWORD` form is the generic fallback when you only have one
|
||||
site.
|
||||
|
||||
In **bash** or **zsh**:
|
||||
|
||||
```console
|
||||
export LP_HN_USERNAME="your-hn-handle"
|
||||
export LP_HN_PASSWORD="your-hn-password"
|
||||
```
|
||||
|
||||
In **fish**:
|
||||
|
||||
```fish
|
||||
set -gx LP_HN_USERNAME "your-hn-handle"
|
||||
set -gx LP_HN_PASSWORD "your-hn-password"
|
||||
```
|
||||
|
||||
The `LP_` prefix matters. The agent resolves `$LP_*` references
|
||||
*inside* the Lightpanda subprocess, so the literal secret never enters
|
||||
the LLM context; and the `getEnv` tool refuses to read anything that
|
||||
doesn't start with `LP_`, so the model can't probe your other env
|
||||
vars.
|
||||
|
||||
Verify they're set before continuing — substitution fails silently if
|
||||
a variable is missing (the literal `$LP_HN_USERNAME` ends up typed
|
||||
into the form), and the `TYPE` confirmation message intentionally
|
||||
echoes the placeholder name rather than the resolved value, so the
|
||||
response text won't tell you. Confirm directly:
|
||||
|
||||
```console
|
||||
./lightpanda agent --no-llm
|
||||
> /getEnv LP_HN_USERNAME
|
||||
```
|
||||
|
||||
`/getEnv` returns the literal value if set, or "not set" if missing.
|
||||
Only `$LP_*` references in fill values are substituted; other `$`
|
||||
characters in your password (`my$ecret`, `$5.99`) are passed through
|
||||
verbatim.
|
||||
|
||||
## 1. First contact: the REPL
|
||||
|
||||
```console
|
||||
./lightpanda agent
|
||||
```
|
||||
|
||||
On startup the agent prints a one-line notice telling you which mode it
|
||||
landed in — which provider (and which env var won), or "basic REPL
|
||||
(no LLM)" if no key is set. The REPL writes its history to
|
||||
`.lp-history` in the working directory, so up-arrow works across runs.
|
||||
|
||||
Try the meta commands:
|
||||
|
||||
```
|
||||
> /help
|
||||
> /help goto
|
||||
> /quit
|
||||
```
|
||||
|
||||
`/help` lists every browser tool. `/help <tool>` prints its JSON
|
||||
schema. `/quit` exits cleanly. If you have no API key yet and want to
|
||||
poke around without an LLM, `./lightpanda agent --no-llm` forces the
|
||||
basic REPL.
|
||||
|
||||
## 2. The shortest possible win: `--task`
|
||||
|
||||
Before doing anything complicated, prove the LLM + browser stack
|
||||
works end-to-end:
|
||||
|
||||
```console
|
||||
./lightpanda agent --task "what is the top story on news.ycombinator.com?"
|
||||
```
|
||||
|
||||
`--task` runs a single user turn, prints the final answer on stdout,
|
||||
and exits. Tool calls, progress, and errors all go to stderr, so
|
||||
redirecting stdout gives you a clean answer:
|
||||
|
||||
```console
|
||||
./lightpanda agent --task "top story on news.ycombinator.com?" > out.txt
|
||||
```
|
||||
|
||||
If you need to feed the model a local file, repeat
|
||||
`--task-attachment <path>` for each one.
|
||||
|
||||
## 3. Driving the browser by hand
|
||||
|
||||
Now back to the REPL. We'll write the HN login flow one command at a
|
||||
time so you can see how each step depends on what the previous one
|
||||
showed.
|
||||
|
||||
```
|
||||
> GOTO https://news.ycombinator.com/login
|
||||
```
|
||||
|
||||
`GOTO` takes an unquoted URL. The page is now loaded.
|
||||
|
||||
> Commands must be uppercase. `click '#foo'` is forwarded to the LLM as
|
||||
> natural language; only `CLICK '#foo'` runs as a command. TAB
|
||||
> completion in the REPL fills in the caps for you — typing `cli<TAB>`
|
||||
> rewrites the line to `CLICK`.
|
||||
|
||||
Inspect it before clicking anything:
|
||||
|
||||
```
|
||||
> TREE
|
||||
```
|
||||
|
||||
`TREE` prints the semantic tree to stdout. Two forms are visible —
|
||||
the login form and the create-account form below it — and each one
|
||||
contains two unlabeled textboxes:
|
||||
|
||||
```
|
||||
8 form
|
||||
13 'username:'
|
||||
15 [i] textbox
|
||||
18 'password:'
|
||||
20 [i]
|
||||
22 [i] button 'login' value='login'
|
||||
30 form
|
||||
35 'username:'
|
||||
37 [i] textbox
|
||||
…
|
||||
```
|
||||
|
||||
Notice the textboxes have no accessible name — "username:" is a
|
||||
sibling text node, not a `<label for="…">`. This is typical of older
|
||||
pages. The ARIA-name tool reflects that:
|
||||
|
||||
```
|
||||
> /findElement role=textbox name=username
|
||||
[]
|
||||
```
|
||||
|
||||
Empty result. Slash commands accept a single positional argument
|
||||
(for tools with one required field), `key=value` pairs, or a raw
|
||||
`{json}` blob; `findElement` filters by accessible name, which we
|
||||
don't have here.
|
||||
|
||||
For unlabeled login forms, jump to `detectForms`, which reads the
|
||||
HTML directly and surfaces each form's `action` plus each input's
|
||||
`name` attribute:
|
||||
|
||||
```
|
||||
> /detectForms
|
||||
```
|
||||
|
||||
You'll see two forms, the first with `action: "login"` and fields
|
||||
named `acct` and `pw`. That's enough to synthesize the CSS selector
|
||||
yourself: scope by form action to avoid colliding with the
|
||||
create-account form, then key on the input's `name` attribute.
|
||||
|
||||
**Selector rule, load-bearing:** the click-family tools (`CLICK`,
|
||||
`TYPE`, `HOVER`, `SELECT`, `CHECK`) accept CSS selectors only. The
|
||||
backend node IDs `findElement` and `detectForms` return are
|
||||
invalidated by any DOM mutation, and they cannot be serialized into
|
||||
PandaScript — a session that uses them is not replayable. Always
|
||||
synthesize a CSS selector from the attributes (`id`, `class`,
|
||||
`name`, `action`, `tag_name`) and use that.
|
||||
|
||||
Now fill the form:
|
||||
|
||||
```
|
||||
> TYPE 'form[action="login"] input[name="acct"]' '$LP_HN_USERNAME'
|
||||
> TYPE 'form[action="login"] input[name="pw"]' '$LP_HN_PASSWORD'
|
||||
> CLICK 'form[action="login"] input[type="submit"][value="login"]'
|
||||
> WAIT '#logout'
|
||||
```
|
||||
|
||||
`$LP_*` references in `TYPE` values are resolved at execution time
|
||||
inside the subprocess. The LLM never sees the literal credential.
|
||||
|
||||
The `WAIT '#logout'` line is doing two jobs at once, and it's worth
|
||||
unpacking because the pattern recurs in every recorded script:
|
||||
|
||||
- **It's a synchronization point.** `CLICK` on the submit button
|
||||
returns as soon as the click dispatches, not after the server
|
||||
responds. Without a wait, the next command races the login redirect
|
||||
and may run against the *pre-login* DOM. `WAIT '<selector>'` blocks
|
||||
until that selector appears, so the script resumes only after HN's
|
||||
logged-in page has rendered.
|
||||
- **It's an implicit assertion.** HN renders the `#logout` link only
|
||||
when the session is authenticated. If the credentials are wrong (or
|
||||
HN throws up a captcha, or rate-limits, or the form layout
|
||||
changed), `#logout` never appears and `WAIT` times out — the script
|
||||
fails loudly at the line where the failure actually happened,
|
||||
instead of silently succeeding and producing garbage downstream.
|
||||
|
||||
You generally want a `WAIT` like this after every state-changing
|
||||
action that triggers async work: pick a selector that *only* exists
|
||||
in the post-action state, and you get free regression protection.
|
||||
Waiting on the URL (`location.pathname === "/news"`) or a generic
|
||||
element that exists on both pages is weaker — both can be true before
|
||||
the navigation finishes.
|
||||
|
||||
Confirm by extracting the username from the header:
|
||||
|
||||
```
|
||||
> EXTRACT '#me'
|
||||
```
|
||||
|
||||
`EXTRACT` prints the element's text to stdout — your handle. After a
|
||||
page-changing action (click, navigation, form submit), the previous
|
||||
`TREE` snapshot is stale; re-inspect before the next interaction.
|
||||
|
||||
## 4. Recording the session
|
||||
|
||||
The same flow, but recorded to a file. Quit the REPL, then:
|
||||
|
||||
```console
|
||||
./lightpanda agent -i hn.lp
|
||||
```
|
||||
|
||||
`-i <path>` opens an interactive REPL that appends state-mutating
|
||||
commands to `<path>`. Retype the same login sequence (`GOTO`, two
|
||||
`TYPE`s, `CLICK`, `WAIT`, then `/quit`).
|
||||
|
||||
Inspect the result:
|
||||
|
||||
```console
|
||||
cat hn.lp
|
||||
```
|
||||
|
||||
You should see the five mutating commands and nothing else — no
|
||||
`TREE`, no `MARKDOWN`, no slash-command lookups. The recorder filters
|
||||
on a per-command flag (`Command.isRecorded()`) so read-only inspection
|
||||
never pollutes the script.
|
||||
|
||||
Diff it against the checked-in fixture:
|
||||
|
||||
```console
|
||||
diff hn.lp hn_login.lp
|
||||
```
|
||||
|
||||
Modulo trailing newlines, they should match. That fixture is what the
|
||||
rest of this tutorial uses.
|
||||
|
||||
## 5. Replaying deterministically
|
||||
|
||||
```console
|
||||
./lightpanda agent hn_login.lp
|
||||
```
|
||||
|
||||
No `--provider`, no LLM, no token spend. The recorded script runs
|
||||
top to bottom against a fresh browser. This is the form you want for
|
||||
regression tests and CI: it's pure replay.
|
||||
|
||||
`LOGIN`, `ACCEPT_COOKIES`, and natural-language lines are the only
|
||||
script entries that require an LLM. A pure recording from `-i` never
|
||||
contains them, so it always replays without `--provider`.
|
||||
|
||||
## 6. Selector drift and `--self-heal`
|
||||
|
||||
Real pages change. Simulate selector drift by editing your copy:
|
||||
|
||||
```console
|
||||
cp hn_login.lp hn_broken.lp
|
||||
sed -i 's/input\[name="acct"\]/input[name="user"]/' hn_broken.lp
|
||||
```
|
||||
|
||||
`input[name="user"]` doesn't exist on HN's login form, so a plain
|
||||
replay fails:
|
||||
|
||||
```console
|
||||
./lightpanda agent hn_broken.lp
|
||||
```
|
||||
|
||||
`TYPE`, `CHECK`, and `SELECT` go a step further than just "did the
|
||||
selector resolve" — a post-exec verifier checks that the DOM
|
||||
actually reflects the intent (the input ended up with the value you
|
||||
typed, the checkbox flipped, the option got selected). That's what
|
||||
catches silent drift before it propagates.
|
||||
|
||||
Now re-run with self-heal:
|
||||
|
||||
```console
|
||||
./lightpanda agent --self-heal --provider anthropic hn_broken.lp
|
||||
```
|
||||
|
||||
(Substitute your provider.) On failure, the agent runs a short,
|
||||
budget-capped LLM turn against the *current* page state, gets a
|
||||
replacement command, runs it, and atomically rewrites `hn_broken.lp`
|
||||
in place. A `hn_broken.lp.bak` is written before any mutation, and
|
||||
the rewritten line is prefixed with a header:
|
||||
|
||||
```pandascript
|
||||
# [Auto-healed] Original: TYPE 'form[action="login"] input[name="user"]' '$LP_USERNAME'
|
||||
TYPE 'form[action="login"] input[name="acct"]' '$LP_USERNAME'
|
||||
```
|
||||
|
||||
Self-heal is intentionally narrow: one replacement per failure, no
|
||||
navigation, capped budget. It's there to recover from selector
|
||||
drift, not to redesign the script.
|
||||
|
||||
Re-run without `--self-heal` to prove replay is back to deterministic:
|
||||
|
||||
```console
|
||||
./lightpanda agent hn_broken.lp
|
||||
```
|
||||
|
||||
## 7. Same script, external agent (MCP)
|
||||
|
||||
Everything above used Lightpanda's built-in agent. If you're driving
|
||||
Lightpanda from a different agent (Claude Code, a custom MCP client,
|
||||
your own harness), use `lightpanda mcp` instead — same browser tools,
|
||||
no API key on the Lightpanda side, the calling agent supplies the
|
||||
LLM.
|
||||
|
||||
Register the server with your MCP client:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"lightpanda": {
|
||||
"command": "/path/to/lightpanda",
|
||||
"args": ["mcp"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Record a session over MCP
|
||||
|
||||
From the external agent, call:
|
||||
|
||||
1. `record_start { "path": "hn.lp" }` — begins appending state-mutating
|
||||
tool calls to `hn.lp`. The path must be relative and free of `..`.
|
||||
2. The same browser tools you'd call anyway: `goto`, `fill`, `click`,
|
||||
`waitForSelector`. Each one that succeeds is appended verbatim;
|
||||
query-only tools (`tree`, `markdown`, `findElement`, `consoleLogs`)
|
||||
are never recorded.
|
||||
3. `record_comment { "text": "logged in" }` — drop a breadcrumb above
|
||||
the next recorded line. Useful for marking the boundary between
|
||||
LLM-driven phases.
|
||||
4. `record_stop {}` — closes the recording and returns
|
||||
`{path, line_count}`.
|
||||
|
||||
The output file is byte-equivalent to what `-i hn.lp` produced in
|
||||
section 4. It replays via the agent CLI without modification:
|
||||
|
||||
```console
|
||||
./lightpanda agent hn.lp
|
||||
```
|
||||
|
||||
### Replay with self-heal over MCP
|
||||
|
||||
MCP doesn't carry a `--self-heal` flag — self-heal is a two-tool
|
||||
roundtrip the calling agent orchestrates:
|
||||
|
||||
1. Read the script. For each non-blank, non-comment line, call
|
||||
`script_step { "line": "<line>" }`. Comments and blanks are no-ops
|
||||
on the Lightpanda side.
|
||||
2. On `isError: true`, the structured error message tells you what
|
||||
failed. Hand the current page state and the failing line to your
|
||||
own LLM; have it return a replacement PandaScript line (or
|
||||
several).
|
||||
3. Call `script_heal { "path": "...", "replacements":
|
||||
[{ "original_line": "...", "replacement_lines": ["..."] }] }`.
|
||||
Each `original_line` must match verbatim. Lightpanda writes
|
||||
`<path>.bak` first, then atomically rewrites the file with the
|
||||
`# [Auto-healed] Original: …` header prepended to the first
|
||||
replacement — same format as section 6.
|
||||
4. Continue from the next line.
|
||||
|
||||
`script_step` deliberately does *not* auto-record: the script is
|
||||
already the source of truth during replay, so double-recording would
|
||||
diverge the file from itself. `LOGIN`, `ACCEPT_COOKIES`, and
|
||||
natural-language lines are rejected — those need an LLM, which is the
|
||||
caller's responsibility.
|
||||
|
||||
## Where to go next
|
||||
|
||||
- [agent.md](agent.md) — full reference: every flag, every PandaScript
|
||||
command, every browser tool, plus the security model and
|
||||
auto-detection rules.
|
||||
- [`hn_login.lp`](../hn_login.lp) — the fixture this tutorial builds.
|
||||
- `lightpanda mcp --help` and `lightpanda agent --help` — current
|
||||
flag listings straight from the binary.
|
||||
Reference in New Issue
Block a user