browser/docs/agent-tutorial.md

# Agent tutorial — Hacker News, end-to-end

This walks you from "I just built `./lightpanda`" to a recorded,
replayable JavaScript browser script, then captures the same flow from
an external MCP client. Every section ends with a command you can run;
nothing references later sections.

For the flag/command/tool tables, see [agent.md](agent.md). This
document is the tutorial; that one is the reference. For the JavaScript
runtime contract, see [agent-script.md](agent-script.md).

## What you'll build

One session against Hacker News:

1. Log in with your account.
2. Confirm the login by reading the username out of the header.
3. Record the whole flow to a `.js` file.
4. Run it offline, with no LLM.
5. Add local JavaScript logic around `extract(...)` results.
6. Record the same browser actions from an external agent over MCP.

## Prerequisites

- `./lightpanda` on your PATH (build with `zig build`).
- A Hacker News account.
- One LLM API key for sections that use natural language — Anthropic,
  OpenAI, Gemini, or a local Ollama. Recorded `.js` scripts run with no
  key at all.

Export your HN credentials as `LP_*` env vars. The convention is
`LP_<SITE>_<FIELD>` — a short site identifier (`HN` for Hacker News,
`GH` for GitHub, …) lets you keep credentials for multiple sites in
your environment without collisions. The unprefixed `LP_USERNAME` /
`LP_PASSWORD` form is the generic fallback when you only have one
site.

In **bash** or **zsh**:

```console
export LP_HN_USERNAME="your-hn-handle"
export LP_HN_PASSWORD="your-hn-password"
```

In **fish**:

```fish
set -gx LP_HN_USERNAME "your-hn-handle"
set -gx LP_HN_PASSWORD "your-hn-password"
```

The `LP_` prefix matters. The agent resolves `$LP_*` references
*inside* the Lightpanda subprocess, so the literal secret never enters
the LLM context; and the `getEnv` tool refuses to read anything that
doesn't start with `LP_`, so the model can't probe your other env
vars.

Verify they're set before continuing — substitution fails silently if
a variable is missing (the literal `$LP_HN_USERNAME` ends up typed
into the form), and the `/fill` confirmation message intentionally
echoes the placeholder name rather than the resolved value, so the
response text won't tell you. Confirm directly:

```console
./lightpanda agent --no-llm
> /getEnv LP_HN_USERNAME
```

`/getEnv` returns the literal value if set, or "not set" if missing.
Only `$LP_*` references in fill values are substituted; other `$`
characters in your password (`my$ecret`, `$5.99`) are passed through
verbatim.

## 1. First contact: the REPL

```console
./lightpanda agent
```

On startup the agent prints a one-line notice telling you which mode it
landed in — which provider (and which env var won), or "basic REPL
(no LLM)" if no key is set. The REPL writes its history to
`.lp-history` in the working directory, so up-arrow works across runs.

Try the meta commands:

```
> /help
> /help goto
> /quit
```

`/help` lists every browser tool. `/help <tool>` prints its JSON
schema. `/quit` exits cleanly. If you have no API key yet and want to
poke around without an LLM, `./lightpanda agent --no-llm` forces the
basic REPL.

## 2. The shortest possible win: `--task`

Before doing anything complicated, prove the LLM + browser stack
works end-to-end:

```console
./lightpanda agent --task "what is the top story on news.ycombinator.com?"
```

`--task` runs a single user turn, prints the final answer on stdout,
and exits. Tool calls, progress, and errors all go to stderr, so
redirecting stdout gives you a clean answer:

```console
./lightpanda agent --task "top story on news.ycombinator.com?" > out.txt
```

If you need to feed the model a local file, repeat
`-a <path>` (or `--attach <path>`) for each one.

## 3. Driving the browser by hand

Now back to the REPL. We'll write the HN login flow one command at a
time so you can see how each step depends on what the previous one
showed.

```
> /goto https://news.ycombinator.com/login
```

`/goto` takes a single URL argument (positional, optionally quoted). The page
is now loaded.

> The REPL scripting surface is slash commands. `click '#foo'` (no leading slash) is
> forwarded to the LLM as a natural-language prompt; only `/click '#foo'`
> runs as a command. TAB completion in the REPL helps you find the right
> tool name.

Inspect it before clicking anything:

```
> /tree
```

`/tree` prints the semantic tree to stdout. Two forms are visible —
the login form and the create-account form below it — and each one
contains two unlabeled textboxes:

```
8 form
 13 'username:'
 15 [i] textbox
 18 'password:'
 20 [i]
 22 [i] button 'login' value='login'
30 form
 35 'username:'
 37 [i] textbox
 …
```

Notice the textboxes have no accessible name — "username:" is a
sibling text node, not a `<label for="…">`. This is typical of older
pages. The ARIA-name tool reflects that:

```
> /findElement role=textbox name=username
[]
```

Empty result. Slash commands accept a single positional argument
(for tools with one required field), `key=value` pairs, or a raw
`{json}` blob; `findElement` filters by accessible name, which we
don't have here.

For unlabeled login forms, jump to `detectForms`, which reads the
HTML directly and surfaces each form's `action` plus each input's
`name` attribute:

```
> /detectForms
```

You'll see two forms, the first with `action: "login"` and fields
named `acct` and `pw`. That's enough to synthesize the CSS selector
yourself: scope by form action to avoid colliding with the
create-account form, then key on the input's `name` attribute.

**Selector rule, load-bearing:** the click-family tools (`/click`,
`/fill`, `/hover`, `/selectOption`, `/setChecked`) accept CSS selectors
only. The backend node IDs `findElement` and `detectForms` return are
invalidated by any DOM mutation, and they cannot be serialized into
durable JavaScript recordings — a session that uses them is not
replayable. Always
synthesize a CSS selector from the attributes (`id`, `class`,
`name`, `action`, `tag_name`) and use that.

Now fill the form:

```
> /fill selector='form[action="login"] input[name="acct"]' value='$LP_HN_USERNAME'
> /fill selector='form[action="login"] input[name="pw"]' value='$LP_HN_PASSWORD'
> /click selector='form[action="login"] input[type="submit"][value="login"]'
> /waitForSelector '#logout'
```

`$LP_*` references in `/fill` values are resolved at execution time
inside the subprocess. The LLM never sees the literal credential.

The `/waitForSelector '#logout'` line is doing two jobs at once, and it's
worth unpacking because the pattern recurs in every recorded script:

- **It's a synchronization point.** `/click` on the submit button
  returns as soon as the click dispatches, not after the server
  responds. Without a wait, the next command races the login redirect
  and may run against the *pre-login* DOM. `/waitForSelector '<sel>'`
  blocks until that selector appears, so the script resumes only after
  HN's logged-in page has rendered.
- **It's an implicit assertion.** HN renders the `#logout` link only
  when the session is authenticated. If the credentials are wrong (or
  HN throws up a captcha, or rate-limits, or the form layout
  changed), `#logout` never appears and `/waitForSelector` times out —
  the script fails loudly at the line where the failure actually
  happened, instead of silently succeeding and producing garbage
  downstream.

You generally want a `/waitForSelector` like this after every state-changing
action that triggers async work: pick a selector that *only* exists
in the post-action state, and you get free regression protection.
Waiting on the URL (`location.pathname === "/news"`) or a generic
element that exists on both pages is weaker — both can be true before
the navigation finishes.

Confirm by pulling structured data off the page. `/extract` takes a
JSON schema object — each value describes what to lift out, and the
whole result is printed to stdout as one JSON object. The simplest
form is a flat selector lookup:

```
> /extract '{"karma": "#karma"}'
{"karma":"42"}
```

The schema grammar is small but covers the cases you'd reach for:

- `"<sel>"` — `textContent.trim()` of the first match (string, or
  `null` if no match).
- `""` — the matched element's own text (only meaningful inside a
  `fields` block, where there's an outer element to refer to).
- `["<sel>"]` — text of every match (string array).
- `{"selector": "<sel>", "attr": "<name>"}` — the first match's
  attribute value.
- `[{"selector": "<sel>", "fields": {…}}]` — array of objects, where
  each `fields` entry is resolved relative to the matched element.

After a page-changing action (click, navigation, form submit) the
previous `/tree` snapshot is stale; re-inspect before the next
interaction. Hop back to the front page and pull the story list to
exercise the structured form:

```
> /goto https://news.ycombinator.com
> /extract '''
{
  "topStories": [{
    "selector": ".athing",
    "fields": {
      "rank": ".rank",
      "title": ".titleline > a",
      "url": {"selector": ".titleline > a", "attr": "href"}
    }
  }]
}
'''
```

Triple-quoted (`'''` or `"""`) values let a schema span multiple lines
— the REPL keeps reading until it sees the matching closing quote.
The result is a single JSON object printed to stdout:
`{"topStories":[{"rank":"1","title":"…","url":"…"}, …]}`.

The schema is parsed in Zig before the page-side walker runs, so a
typo like a stray comma surfaces here as `Error: invalid /extract
schema JSON` instead of a confusing V8 stack trace.

## 4. Recording the session

The same flow, but recorded to a file. Quit the REPL, then:

```console
./lightpanda agent -i hn_login.js
```

`-i <path>` opens an interactive REPL that appends state-mutating
commands to `<path>`. Retype the same sequence — login (`/goto`, two
`/fill`s, `/click`, `/waitForSelector`), then the front-page hop and
structured pull (`/goto`, multi-line `/extract`) — then `/quit`.

Inspect the result:

```console
cat hn_login.js
```

You should see the seven mutating commands and nothing else — no
`/tree`, no `/markdown`, no read-only lookups. The recorder filters on a
per-tool flag (`ToolDef.recorded`) so read-only inspection never
pollutes the script; `/extract` *is* recorded (it changes what the
script can read on replay even though it doesn't mutate the page).
The saved file is JavaScript:

```js
goto("https://news.ycombinator.com/login");
fill({ selector: "form[action=\"login\"] input[name=\"acct\"]", value: "$LP_HN_USERNAME" });
fill({ selector: "form[action=\"login\"] input[name=\"pw\"]", value: "$LP_HN_PASSWORD" });
click({ selector: "form[action=\"login\"] input[type=\"submit\"][value=\"login\"]" });
waitForSelector("#logout");
goto("https://news.ycombinator.com");
extract({ topStories: [{ selector: ".athing", fields: { rank: ".rank", title: ".titleline > a", url: { selector: ".titleline > a", attr: "href" } } }] });
```

Natural-language REPL turns are not saved as executable JavaScript.
When a natural-language turn produces recorded browser actions, the
prompt is kept as a `//` comment above those actions.

## 5. Running deterministically

```console
./lightpanda agent hn_login.js
```

No `--provider`, no LLM, no token spend. The recorded script runs top
to bottom against a fresh browser. This is the form you want for
regression tests and CI.

A JavaScript script does not print its final expression automatically.
The recorded `extract(...)` call returns a local JavaScript value, so
edit the final line when you want replay output:

```js
const topStories = extract({
  topStories: [{
    selector: ".athing",
    fields: {
      rank: ".rank",
      title: ".titleline > a",
      url: { selector: ".titleline > a", attr: "href" }
    }
  }]
});

console.log(JSON.stringify({ topStories }, null, 2));
```

Run it again and stdout is clean JSON:

```console
./lightpanda agent hn_login.js > stories.json
```

`/login` and `/acceptCookies` are REPL-only LLM triggers. A pure
recording from `-i` never contains them; the recorder captures the
resulting browser tool calls instead. Lines that are neither slash
commands nor comments are also REPL-only conveniences, not script
syntax.

## 6. Local JavaScript logic

Agent scripts run in a separate JavaScript context from the web page.
There is no `window`, `document`, DOM API, `require`, `process`, or
Node standard library in that context. Browser interaction happens
through the installed primitives, and `extract(...)` is the usual way
to move page data into local script logic.

For example, turn the extraction result into a smaller report without
running any page-side JavaScript:

```js
goto("https://news.ycombinator.com");

const topStories = extract({
  topStories: [{
    selector: ".athing",
    limit: 5,
    fields: {
      rank: ".rank",
      title: ".titleline > a",
      url: { selector: ".titleline > a", attr: "href" }
    }
  }]
});

const report = topStories.map((story) => ({
  rank: story.rank,
  title: story.title,
  url: story.url
}));

console.log(JSON.stringify({ report }, null, 2));
```

Use the `eval(...)` primitive only when you intentionally want to run a
string in the current page's JavaScript context. Page eval cannot see
agent variables or call `goto`, `extract`, and the other agent
primitives.

## 7. Same flow, external agent (MCP)

Everything above used Lightpanda's built-in agent. If you're driving
Lightpanda from a different agent (Claude Code, a custom MCP client,
your own harness), use `lightpanda mcp` instead — same browser tools,
no API key on the Lightpanda side, the calling agent supplies the
LLM.

Register the server with your MCP client:

```json
{
  "mcpServers": {
    "lightpanda": {
      "command": "/path/to/lightpanda",
      "args": ["mcp"]
    }
  }
}
```

### Record a session over MCP

From the external agent, call:

1. `recordStart { "path": "hn_login.js" }` — begins appending
   state-mutating tool calls as JavaScript to `hn_login.js`. The path
   must be relative and free of `..`.
2. The same browser tools you'd call anyway: `goto`, `fill`, `click`,
   `waitForSelector`. Each one that succeeds is appended as a
   JavaScript primitive call;
   query-only tools (`tree`, `markdown`, `findElement`, `consoleLogs`)
   are never recorded.
3. `recordComment { "text": "logged in" }` — drop a `//` breadcrumb above
   the next recorded line. Useful for marking the boundary between
   LLM-driven phases.
4. `recordStop {}` — closes the recording and returns
   `{path, line_count}`.

The output file uses the same JavaScript format as `-i hn_login.js`
from section 4. It runs via the agent CLI without modification:

```console
./lightpanda agent hn_login.js
```

### About `scriptStep` and `scriptHeal`

`recordStart` now records JavaScript agent scripts. The MCP
`scriptStep` and `scriptHeal` tools are still available for external
agents that want to run and repair one slash-command line at a time,
but that is a separate PandaScript line-healing workflow:

1. Read the PandaScript file you are healing. For each non-blank,
   non-comment line, call
   `scriptStep { "line": "<line>" }`. Comments and blanks are no-ops
   on the Lightpanda side.
2. On `isError: true`, the structured error message tells you what
   failed. Hand the current page state and the failing line to your
   own LLM; have it return a replacement PandaScript line (or
   several).
3. Call `scriptHeal { "path": "...", "replacements":
   [{ "original_line": "...", "replacement_lines": ["..."] }] }`.
   Each `original_line` must match verbatim. Lightpanda writes
   `<path>.bak` first, then atomically rewrites the file with the
   `# [Auto-healed] Original: …` header prepended to the first
   replacement.
4. Continue from the next line.

`scriptStep` deliberately does *not* auto-record: the script is
already the source of truth during replay, so double-recording would
diverge the file from itself. `/login`, `/acceptCookies`, and any line
that isn't a slash command are rejected — those need an LLM, which is
the caller's responsibility.

## Where to go next

- [agent.md](agent.md) — full reference: every flag, every slash
  command, every browser tool, plus the security model and
  auto-detection rules.
- [agent-script.md](agent-script.md) — JavaScript runtime, primitives,
  return values, and complete script examples.
- `lightpanda mcp --help` and `lightpanda agent --help` — current
  flag listings straight from the binary.