mirror of
https://github.com/lightpanda-io/browser.git
synced 2026-06-11 01:25:53 -04:00
293 lines
8.1 KiB
Markdown
293 lines
8.1 KiB
Markdown
# Agent tutorial, Hacker News end-to-end
|
|
|
|
This tutorial will walk you through how to use Lightpanda Agent to create a
|
|
reproducible JavaScript browser script.
|
|
|
|
For flag and command tables, see [agent.md](agent.md).
|
|
|
|
## Prerequisites
|
|
|
|
- `./lightpanda` on your PATH.
|
|
- A Hacker News account.
|
|
- An LLM API key for the natural-language sections (Anthropic, OpenAI,
|
|
Gemini, or a local Ollama). Recorded `.js` scripts need no key.
|
|
|
|
Export your HN credentials as `LP_*` env vars. The `LP_` prefix
|
|
matters for security: Lightpanda only resolves these placeholders
|
|
inside its own subprocess (so your password never reaches the LLM),
|
|
and the `getEnv` tool refuses any variable that doesn't start with
|
|
`LP_` (so the LLM can't read your other secrets).
|
|
|
|
```console
|
|
export LP_HN_USERNAME="your-hn-handle"
|
|
export LP_HN_PASSWORD="your-hn-password"
|
|
```
|
|
|
|
Check the variables are set. If one is missing, `/fill` will silently
|
|
type the literal `$LP_HN_USERNAME` into the form rather than your
|
|
username:
|
|
|
|
```console
|
|
./lightpanda agent --no-llm
|
|
> /getEnv LP_HN_USERNAME
|
|
```
|
|
|
|
## 1. Start the REPL
|
|
|
|
```console
|
|
./lightpanda agent
|
|
```
|
|
|
|
The status bar under the prompt shows the resolved model and whether
|
|
natural language is available. REPL history lives in `.lp-history` in
|
|
the working directory.
|
|
|
|
```
|
|
> /help # list every browser tool
|
|
> /help goto # JSON schema for one tool
|
|
> /quit
|
|
```
|
|
|
|
No API key? `./lightpanda agent --no-llm` runs the slash-commands-only
|
|
REPL.
|
|
|
|
## 2. Run a single task
|
|
|
|
Before doing anything complicated, run a one-shot task to check
|
|
everything works:
|
|
|
|
```console
|
|
./lightpanda agent --task "what is the top story on news.ycombinator.com?"
|
|
```
|
|
|
|
`--task` runs one user turn, prints the answer on stdout, exits.
|
|
Tool calls and progress go to stderr, so redirecting gives you a
|
|
clean answer:
|
|
|
|
```console
|
|
./lightpanda agent --task "top story on news.ycombinator.com?" > out.txt
|
|
```
|
|
|
|
Use `-a <path>` (repeatable) to attach local files.
|
|
|
|
## 3. Log in to Hacker News
|
|
|
|
Paste these into the REPL in order:
|
|
|
|
```
|
|
> /goto https://news.ycombinator.com/login
|
|
> /fill selector='form[action="login"] input[name="acct"]' value='$LP_HN_USERNAME'
|
|
> /fill selector='form[action="login"] input[name="pw"]' value='$LP_HN_PASSWORD'
|
|
> /click selector='form[action="login"] input[type="submit"][value="login"]'
|
|
> /waitForSelector '#logout'
|
|
```
|
|
|
|
A few things worth knowing:
|
|
|
|
- **Slash commands only.** `click '#foo'` is forwarded to the LLM;
|
|
only `/click '#foo'` runs as a command. TAB completes tool names.
|
|
- **`/waitForSelector '#logout'` is both a sync point and an
|
|
assertion.** It blocks until HN's logged-in DOM renders. If the
|
|
credentials are wrong (or the website throws a captcha, or the layout
|
|
changes), the command times out at the line where the failure
|
|
actually happened. Use this pattern after every state-changing
|
|
action.
|
|
- **Selectors are CSS only.** The click-family tools (`/click`,
|
|
`/fill`, `/hover`, `/selectOption`, `/setChecked`) accept CSS
|
|
only. Backend node IDs are invalidated by any DOM mutation and
|
|
can't be serialized into recordings.
|
|
|
|
Confirm the login worked:
|
|
|
|
```
|
|
> /extract '{"karma": "#karma"}'
|
|
{"karma":"42"}
|
|
```
|
|
|
|
`/extract` takes a JSON schema and prints one JSON object to stdout.
|
|
Schema grammar:
|
|
|
|
- `"<sel>"`: text of the first match.
|
|
- `["<sel>"]`: text of every match.
|
|
- `{"selector": "<sel>", "attr": "<name>"}`: attribute of the first match.
|
|
- `[{"selector": "<sel>", "fields": {…}}]`: array of records, with
|
|
each `fields` entry resolved relative to the matched element.
|
|
|
|
Now go to the front page and pull the story list:
|
|
|
|
```
|
|
> /goto https://news.ycombinator.com
|
|
> /extract '''
|
|
{
|
|
"topStories": [{
|
|
"selector": ".athing",
|
|
"fields": {
|
|
"rank": ".rank",
|
|
"title": ".titleline > a",
|
|
"url": {"selector": ".titleline > a", "attr": "href"}
|
|
}
|
|
}]
|
|
}
|
|
'''
|
|
```
|
|
|
|
Triple-quoted values let a schema span multiple lines.
|
|
|
|
### How we got those selectors
|
|
|
|
Skip this if you're happy treating them as given.
|
|
|
|
`/tree` prints the semantic tree. On the login page you'll see two
|
|
forms (login and signup) with unlabeled textboxes, which means
|
|
`/findElement role=textbox name=username` returns nothing.
|
|
|
|
`/detectForms` reads the HTML directly and surfaces each form's
|
|
`action` plus each input's `name`. The first form has `action: "login"`
|
|
and fields named `acct` and `pw`, which gives us the form-scoped
|
|
selector (`form[action="login"] input[name="acct"]`) that won't
|
|
collide with signup.
|
|
|
|
`/nodeDetails backendNodeId=<n>` is the alternative: it returns a
|
|
ready-to-use CSS selector for any node ID from `/tree`.
|
|
|
|
## 4. Save the session as a script
|
|
|
|
Retype the login plus front-page sequence in a single REPL session,
|
|
then:
|
|
|
|
```
|
|
> /save hn_login.js
|
|
> /quit
|
|
```
|
|
|
|
In `--no-llm` mode, `/save` transcribes the session deterministically.
|
|
With an LLM, it synthesizes an idiomatic script. The result is
|
|
JavaScript:
|
|
|
|
```js
|
|
goto("https://news.ycombinator.com/login");
|
|
fill({ selector: "form[action=\"login\"] input[name=\"acct\"]", value: "$LP_HN_USERNAME" });
|
|
fill({ selector: "form[action=\"login\"] input[name=\"pw\"]", value: "$LP_HN_PASSWORD" });
|
|
click({ selector: "form[action=\"login\"] input[type=\"submit\"][value=\"login\"]" });
|
|
waitForSelector("#logout");
|
|
goto("https://news.ycombinator.com");
|
|
extract({ topStories: [{ selector: ".athing", fields: { rank: ".rank", title: ".titleline > a", url: { selector: ".titleline > a", attr: "href" } } }] });
|
|
```
|
|
|
|
Only state-mutating commands are recorded; read-only ones (`/tree`,
|
|
`/markdown`) are dropped. `/extract` is recorded because it shapes
|
|
what the script returns.
|
|
|
|
## 5. Replay the script without an LLM
|
|
|
|
```console
|
|
./lightpanda agent hn_login.js
|
|
```
|
|
|
|
No `--provider`, no API key, no token spend. The script's last
|
|
expression is printed automatically as JSON. Because the saved script
|
|
ends with `extract(...)`, you get clean JSON on stdout:
|
|
|
|
```console
|
|
./lightpanda agent hn_login.js > stories.json
|
|
```
|
|
|
|
From inside the REPL, `/load hn_login.js` runs the same script against
|
|
the current session.
|
|
|
|
To reshape the output, assign the result and end with a bare expression
|
|
(the final value is what prints):
|
|
|
|
```js
|
|
const topStories = extract({
|
|
topStories: [{
|
|
selector: ".athing",
|
|
fields: {
|
|
rank: ".rank",
|
|
title: ".titleline > a",
|
|
url: { selector: ".titleline > a", attr: "href" }
|
|
}
|
|
}]
|
|
});
|
|
|
|
topStories;
|
|
```
|
|
|
|
## 6. Add your own JavaScript logic
|
|
|
|
Agent scripts run in a separate JavaScript context from the page. No
|
|
`window`, `document`, DOM API, `require`, or `process`. Browser
|
|
interaction happens through the installed primitives.
|
|
|
|
Use `extract(...)` to move page data into local logic, then process it
|
|
with normal JavaScript:
|
|
|
|
```js
|
|
goto("https://news.ycombinator.com");
|
|
|
|
const topStories = extract({
|
|
topStories: [{
|
|
selector: ".athing",
|
|
limit: 5,
|
|
fields: {
|
|
rank: ".rank",
|
|
title: ".titleline > a",
|
|
url: { selector: ".titleline > a", attr: "href" }
|
|
}
|
|
}]
|
|
});
|
|
|
|
topStories.map((s) => ({ rank: s.rank, title: s.title, url: s.url }));
|
|
```
|
|
|
|
Use `evaluate(...)` only when you intentionally want a string to run in
|
|
the page's JavaScript context. Page evaluate cannot see agent
|
|
variables or call agent primitives.
|
|
|
|
## 7. Use Lightpanda from another agent (MCP)
|
|
|
|
If you're driving Lightpanda from a different agent (Claude Code, a
|
|
custom MCP client, your own harness), use `lightpanda mcp` instead.
|
|
The calling agent supplies the LLM, so Lightpanda needs no API key.
|
|
|
|
```json
|
|
{
|
|
"mcpServers": {
|
|
"lightpanda": {
|
|
"command": "/path/to/lightpanda",
|
|
"args": ["mcp"]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Drive the browser with the usual tools (`goto`, `fill`, `click`,
|
|
`waitForSelector`), then hand back a script with the `save` tool:
|
|
|
|
```json
|
|
{
|
|
"tool": "save",
|
|
"args": {
|
|
"path": "hn_login.js",
|
|
"script": "goto(\"...\");\n..."
|
|
}
|
|
}
|
|
```
|
|
|
|
The path must be relative and free of `..`. Literal `LP_*` values are
|
|
scrubbed back to placeholders before the file is written. The output
|
|
runs unmodified:
|
|
|
|
```console
|
|
./lightpanda agent hn_login.js
|
|
```
|
|
|
|
## What next?
|
|
|
|
- [agent.md](agent.md): full reference for every flag, slash command,
|
|
and browser tool.
|
|
- [agent-script.md](agent-script.md): JavaScript runtime, primitives,
|
|
return values.
|
|
- `lightpanda mcp --help` and `lightpanda agent --help`: current
|
|
flags straight from the binary.
|