diff --git a/README.md b/README.md index d80ebe67..4e630166 100644 --- a/README.md +++ b/README.md @@ -173,9 +173,9 @@ A skill is available in [lightpanda-io/agent-skill](https://github.com/lightpand `lightpanda agent` runs an interactive agent on top of the same browser. It supports an LLM-driven REPL (Anthropic, OpenAI, Gemini, Ollama), a one-shot -`--task` mode that prints the answer to stdout, and a small scripting -language (PandaScript) for recording and deterministically replaying browser -sessions, with optional `--self-heal` recovery from selector drift. +`--task` mode that prints the answer to stdout, and JavaScript scripts for +recording and deterministically replaying browser sessions. The interactive +REPL remains slash-command based. To drive Lightpanda from another LLM agent (Claude Code, an MCP-aware client, etc.), use `lightpanda mcp` above — it exposes the same browser tools without @@ -185,12 +185,13 @@ needing an LLM (or API key) inside Lightpanda. ./lightpanda agent # auto-detects API key from env ./lightpanda agent --task "top story on news.ycombinator.com?" ./lightpanda agent --no-llm # basic REPL, no LLM -./lightpanda agent session.lp # replay a recorded script +./lightpanda agent session.js # run a recorded script ./lightpanda agent --provider gemini --task "..." # force a specific provider ``` -See [docs/agent.md](docs/agent.md) for the full reference, or -[docs/agent-tutorial.md](docs/agent-tutorial.md) for a step-by-step +See [docs/agent.md](docs/agent.md) for the full reference, +[docs/agent-script.md](docs/agent-script.md) for the JavaScript script format, +or [docs/agent-tutorial.md](docs/agent-tutorial.md) for a step-by-step end-to-end walkthrough. ### Telemetry diff --git a/docs/agent-script.md b/docs/agent-script.md new file mode 100644 index 00000000..8221f47a --- /dev/null +++ b/docs/agent-script.md @@ -0,0 +1,380 @@ +# Agent JavaScript scripts + +`lightpanda agent ` runs a JavaScript file that drives a +Lightpanda browser session through a small set of blocking global functions. +The format is intentionally plain JavaScript: use normal variables, functions, +loops, objects, arrays, `JSON.parse`, `JSON.stringify`, and other standard +ECMAScript built-ins. + +```console +./lightpanda agent session.js +``` + +The interactive REPL still uses slash commands. `/save` writes the recorded +browser actions as JavaScript calls so the saved file can be replayed without +an LLM. + +## Runtime Environment + +Agent scripts run in their own V8 context. That context is separate from the +web page's JavaScript context. + +- It is not the browser page environment. There is no `window`, `document`, + DOM, `localStorage`, `navigator`, or page global state in the agent script. + Read page data with `extract(...)`, or explicitly run page JavaScript with + `eval(...)` when that is the right tool. +- It is not Node.js. There is no `require`, `process`, `fs`, `path`, npm + package loading, command-line argument API, or Node network/filesystem API. +- Page scripts cannot see agent variables or Lightpanda primitives. Agent + scripts cannot directly see page variables. +- The global `eval(...)` name is the Lightpanda page-eval primitive in this + context. Do not rely on JavaScript's native local eval behavior in agent + scripts. +- Agent variables persist for the lifetime of one script run, across + navigations and primitive calls. A later `lightpanda agent script.js` run + starts with a fresh agent context. +- The installed primitives are synchronous and blocking. Do not write an + `async`/`await` automation contract around them. +- Tool failures throw JavaScript `Error` exceptions and stop execution unless + you catch them. +- A final expression is not printed automatically. Use `console.log(...)` for + script output. + +The agent context includes a small `console` object: + +```js +console.log("printed to stdout"); +console.info("printed to stdout"); +console.debug("printed to stdout"); +console.warn("printed to stderr"); +console.error("printed to stderr"); +``` + +## Values And Return Types + +Most primitives return the browser tool's result text as a JavaScript string. +`extract(...)` is the exception: it returns extracted data as a normal +JavaScript value, so local script logic can use it directly. A schema with +multiple top-level fields returns an object: + +```js +goto("https://news.ycombinator.com/"); + +const data = extract({ + title: "title", + stories: [{ + selector: "tr.athing", + limit: 5, + fields: { + id: { attr: "id" }, + title: ".titleline > a" + } + }] +}); + +console.log(JSON.stringify(data, null, 2)); +``` + +When the schema has exactly one top-level array field, `extract(...)` unwraps +that field and returns the array directly: + +```js +const stories = extract({ + stories: [{ + selector: "tr.athing", + limit: 5, + fields: { + id: { attr: "id" }, + title: ".titleline > a" + } + }] +}); + +for (const story of stories) { + console.log(story.title); +} +``` + +`eval(...)` still returns the page-eval tool result text. When page `eval(...)` +returns an object or array, that text is JSON. + +Primitive arguments must be JSON-serializable. Strings, numbers, booleans, +arrays, plain objects, and `null` work. `undefined`, functions, symbols, and +cyclic objects do not. + +## Installed Primitives + +Only recorded browser primitives are installed globally: + +| Primitive | Arguments | Runs in | +|-----------|-----------|---------| +| `goto` | `goto(url)` or `goto({ url, timeout, waitUntil })` | Browser session | +| `extract` | `extract(schema)` or `extract({ schema })` | Browser page via extractor; returns a JS object or array | +| `eval` | `eval(script)` or `eval({ script, url, timeout, waitUntil, save })` | Browser page JS context | +| `click` | `click({ selector })` or `click({ backendNodeId })` | Browser page | +| `fill` | `fill({ selector, value })` or `fill({ backendNodeId, value })` | Browser page | +| `scroll` | `scroll()` or `scroll({ x, y, backendNodeId })` | Browser page | +| `waitForSelector` | `waitForSelector(selector)` or `waitForSelector({ selector, timeout })` | Browser page | +| `waitForScript` | `waitForScript(script)` or `waitForScript({ script, timeout })` | Browser page JS context | +| `hover` | `hover({ selector })` or `hover({ backendNodeId })` | Browser page | +| `press` | `press(key)` or `press({ key, selector, backendNodeId })` | Browser page | +| `selectOption` | `selectOption({ selector, value })` or `selectOption({ backendNodeId, value })` | Browser page | +| `setChecked` | `setChecked({ selector, checked })` or `setChecked({ backendNodeId, checked })` | Browser page | + +`waitUntil` accepts `"load"`, `"domcontentloaded"`, `"networkidle"`, or +`"done"`. + +Prefer CSS selectors in saved scripts. `backendNodeId` values are tied to the +current DOM snapshot and are not stable after navigation or DOM mutation. + +## Navigation + +Use `goto(...)` to open a page: + +```js +goto("https://example.com"); + +goto({ + url: "https://example.com/app", + timeout: 15000, + waitUntil: "domcontentloaded" +}); +``` + +The call returns a status string. It throws if navigation fails or times out. + +## Structured Extraction + +Use `extract(...)` to read data from the current page without writing page-side +JavaScript. This is the preferred bridge from page content into local agent +logic. + +```js +const result = extract({ + heading: "h1", + links: [{ + selector: "a", + limit: 10, + fields: { + text: "", + href: { attr: "href" } + } + }] +}); +``` + +The schema forms are: + +| Schema value | Meaning | +|--------------|---------| +| `""` | Text of the first matching element, or `null` | +| `""` | Text of the current matched element inside a `fields` block | +| `[""]` | Text of all matching elements | +| `{ selector: "", attr: "" }` | Attribute from the first match | +| `[{ selector: "", attr: "" }]` | Attribute from all matches | +| `[{ selector: "", fields: { ... } }]` | Array of records, with fields resolved relative to each matched element | +| `limit: N` | Cap array extraction to `N` matches | +| `follow: ` | Fetch a detail page for each matched row and extract from that document | + +Return shape follows the top-level schema: + +- `extract({ title: "h1" })` returns `{ title: "..." }`. +- `extract({ title: "h1", links: [{ selector: "a" }] })` returns an object + with both fields. +- `extract({ links: [{ selector: "a" }] })` returns the `links` array + directly, because it is the only top-level field and its value is an array. +- `extract([{ selector: "a" }])` is shorthand for a single anonymous array + extraction and returns an array. + +`follow` is useful for list-to-detail scraping: + +```js +const stories = extract({ + stories: [{ + selector: "tr.athing", + limit: 5, + fields: { + id: { attr: "id" }, + title: ".titleline > a", + comments: [{ + follow: "/item?id={id}", + selector: "tr.athing.comtr:has(.commtext)", + limit: 3, + fields: { + author: ".hnuser", + text: ".commtext" + } + }] + } + }] +}); +``` + +When passing an object directly to `extract(...)`, the runtime serializes it as +the extractor schema. These forms are equivalent: + +```js +extract({ title: "h1" }); +extract({ schema: { title: "h1" } }); +extract('{ "title": "h1" }'); +``` + +Use local variables to keep extracted data available to later script logic: + +```js +const page = extract({ title: "title" }); +``` + +## Page JavaScript + +`eval(...)` is the explicit escape hatch into the current page's JavaScript +context. Its script string runs where `window` and `document` exist. + +```js +goto("https://example.com"); + +const title = eval("document.title"); +console.log(title); +``` + +Keep the boundary clear: + +```js +const selector = "h1"; + +// Good: local agent logic builds an extract schema. +const data = extract({ heading: selector }); + +// Bad: page eval cannot see local agent variables. +eval("document.querySelector(selector).textContent"); +``` + +Page `eval(...)` cannot call `goto`, `extract`, or other agent primitives. +Agent scripts cannot access `document` directly. If you need page DOM data, +prefer `extract(...)`; use `eval(...)` only for page behavior that extraction +cannot express. + +`waitForScript(...)` also evaluates in the page context, repeatedly, until the +expression is truthy or the timeout expires: + +```js +waitForScript("document.querySelectorAll('.row').length >= 5"); +``` + +## Interaction Primitives + +The action primitives operate on the current page. Most take one object whose +fields match the browser tool schema: + +```js +click({ selector: "a.login" }); +fill({ selector: "input[name='acct']", value: "$LP_HN_USERNAME" }); +fill({ selector: "input[name='pw']", value: "$LP_HN_PASSWORD" }); +press("Enter"); + +waitForSelector("#logout"); + +hover({ selector: "#menu" }); +selectOption({ selector: "select[name='country']", value: "FR" }); +setChecked({ selector: "input[name='terms']", checked: true }); +setChecked({ selector: "input[name='newsletter']", checked: false }); + +scroll({ y: 600 }); +scroll(); +``` + +`setChecked` defaults `checked` to `true` when the field is omitted. +`$LP_*` placeholders in string arguments are resolved inside the Lightpanda +process. This keeps credentials out of recorded scripts and LLM prompts. In +recordings, resolved `LP_*` values are scrubbed back to placeholders. + +## Recording Format + +The REPL remains slash-command based: + +```text +> /goto https://example.com +> /click selector='a.login' +> /save +``` + +`/save` writes JavaScript by default: + +```js +goto("https://example.com"); +click({ selector: "a.login" }); +``` + +Only replayable browser actions are recorded: + +- Recorded: `goto`, `click`, `fill`, `scroll`, `hover`, `press`, + `selectOption`, `setChecked`, `waitForSelector`, `waitForScript`, `eval`, + and `extract`. +- Not recorded: read-only exploration tools such as `tree`, `markdown`, + `links`, `findElement`, `consoleLogs`, `getUrl`, `getCookies`, and `getEnv`. +- Natural-language prompts and recording comments are written as `//` comments. + +## Error Handling + +Primitive failures throw JavaScript exceptions: + +```js +try { + waitForSelector({ selector: "#dashboard", timeout: 1000 }); +} catch (err) { + console.error("dashboard did not appear:", err.message); + throw err; +} +``` + +Common failures: + +| Error | Meaning | +|-------|---------| +| `ReferenceError: document is not defined` | You tried to use browser DOM APIs in the agent context. Use `extract(...)` or page `eval(...)`. | +| `ReferenceError: require is not defined` | Agent scripts are not Node.js scripts. | +| `no page loaded - run goto(url) first` | A page-dependent primitive ran before navigation. | +| `invalid arguments` | A primitive received the wrong number or shape of arguments, or a non-JSON-serializable value. | + +## Complete Example + +This script opens Hacker News, extracts five stories, visits each comments +page, and prints one JSON object. The looping and object assembly happen in +the local agent script, not in the page. + +```js +const HN = "https://news.ycombinator.com"; + +goto(HN); + +const stories = extract({ + stories: [{ + selector: "tr.athing", + limit: 5, + fields: { + id: { attr: "id" }, + title: ".titleline > a", + url: { selector: ".titleline > a", attr: "href" } + } + }] +}); + +for (const story of stories) { + story.comments = []; + if (!story.id) continue; + + goto(`${HN}/item?id=${story.id}`); + story.comments = extract({ + comments: [{ + selector: "tr.athing.comtr:has(.commtext)", + limit: 3, + fields: { + author: ".hnuser", + text: ".commtext" + } + }] + }); +} + +console.log(JSON.stringify({ stories }, null, 2)); +``` diff --git a/docs/agent-tutorial.md b/docs/agent-tutorial.md index 97b1be6f..a9f40bd8 100644 --- a/docs/agent-tutorial.md +++ b/docs/agent-tutorial.md @@ -1,12 +1,13 @@ # Agent tutorial — Hacker News, end-to-end This walks you from "I just built `./lightpanda`" to a recorded, -replayable, self-healing browser script — and then drives the same -script from an external MCP client. Every section ends with a command -you can run; nothing references later sections. +replayable JavaScript browser script, then captures the same flow from +an external MCP client. Every section ends with a command you can run; +nothing references later sections. For the flag/command/tool tables, see [agent.md](agent.md). This -document is the tutorial; that one is the reference. +document is the tutorial; that one is the reference. For the JavaScript +runtime contract, see [agent-script.md](agent-script.md). ## What you'll build @@ -14,22 +15,18 @@ One session against Hacker News: 1. Log in with your account. 2. Confirm the login by reading the username out of the header. -3. Record the whole flow to a `.lp` file. -4. Replay it offline, with no LLM. -5. Break a selector on purpose; watch `--self-heal` repair the file. -6. Drive the same script from an external agent over MCP. - -The finished artifact already exists in the repo as -[`hn_login.lp`](../hn_login.lp). Diff your recording against it at the -end as a sanity check. +3. Record the whole flow to a `.js` file. +4. Run it offline, with no LLM. +5. Add local JavaScript logic around `extract(...)` results. +6. Record the same browser actions from an external agent over MCP. ## Prerequisites - `./lightpanda` on your PATH (build with `zig build`). - A Hacker News account. -- One LLM API key for sections that need natural language and - self-healing — Anthropic, OpenAI, Gemini, or a local Ollama. Sections - 4–7 work with no key at all. +- One LLM API key for sections that use natural language — Anthropic, + OpenAI, Gemini, or a local Ollama. Recorded `.js` scripts run with no + key at all. Export your HN credentials as `LP_*` env vars. The convention is `LP__` — a short site identifier (`HN` for Hacker News, @@ -131,7 +128,7 @@ showed. `/goto` takes a single URL argument (positional, optionally quoted). The page is now loaded. -> PandaScript is just slash commands. `click '#foo'` (no leading slash) is +> The REPL scripting surface is slash commands. `click '#foo'` (no leading slash) is > forwarded to the LLM as a natural-language prompt; only `/click '#foo'` > runs as a command. TAB completion in the REPL helps you find the right > tool name. @@ -190,7 +187,8 @@ create-account form, then key on the input's `name` attribute. `/fill`, `/hover`, `/selectOption`, `/setChecked`) accept CSS selectors only. The backend node IDs `findElement` and `detectForms` return are invalidated by any DOM mutation, and they cannot be serialized into -PandaScript — a session that uses them is not replayable. Always +durable JavaScript recordings — a session that uses them is not +replayable. Always synthesize a CSS selector from the attributes (`id`, `class`, `name`, `action`, `tag_name`) and use that. @@ -287,7 +285,7 @@ schema JSON` instead of a confusing V8 stack trace. The same flow, but recorded to a file. Quit the REPL, then: ```console -./lightpanda agent -i hn.lp +./lightpanda agent -i hn_login.js ``` `-i ` opens an interactive REPL that appends state-mutating @@ -298,90 +296,112 @@ structured pull (`/goto`, multi-line `/extract`) — then `/quit`. Inspect the result: ```console -cat hn.lp +cat hn_login.js ``` You should see the seven mutating commands and nothing else — no `/tree`, no `/markdown`, no read-only lookups. The recorder filters on a per-tool flag (`ToolDef.recorded`) so read-only inspection never pollutes the script; `/extract` *is* recorded (it changes what the -script will output on replay even though it doesn't mutate the page). +script can read on replay even though it doesn't mutate the page). +The saved file is JavaScript: -Diff it against the checked-in fixture: +```js +goto("https://news.ycombinator.com/login"); +fill({ selector: "form[action=\"login\"] input[name=\"acct\"]", value: "$LP_HN_USERNAME" }); +fill({ selector: "form[action=\"login\"] input[name=\"pw\"]", value: "$LP_HN_PASSWORD" }); +click({ selector: "form[action=\"login\"] input[type=\"submit\"][value=\"login\"]" }); +waitForSelector("#logout"); +goto("https://news.ycombinator.com"); +extract({ topStories: [{ selector: ".athing", fields: { rank: ".rank", title: ".titleline > a", url: { selector: ".titleline > a", attr: "href" } } }] }); +``` + +Natural-language REPL turns are not saved as executable JavaScript. +When a natural-language turn produces recorded browser actions, the +prompt is kept as a `//` comment above those actions. + +## 5. Running deterministically ```console -diff hn.lp hn_login.lp +./lightpanda agent hn_login.js ``` -Modulo trailing newlines, they should match. That fixture is what the -rest of this tutorial uses. +No `--provider`, no LLM, no token spend. The recorded script runs top +to bottom against a fresh browser. This is the form you want for +regression tests and CI. -## 5. Replaying deterministically +A JavaScript script does not print its final expression automatically. +The recorded `extract(...)` call returns a local JavaScript value, so +edit the final line when you want replay output: + +```js +const topStories = extract({ + topStories: [{ + selector: ".athing", + fields: { + rank: ".rank", + title: ".titleline > a", + url: { selector: ".titleline > a", attr: "href" } + } + }] +}); + +console.log(JSON.stringify({ topStories }, null, 2)); +``` + +Run it again and stdout is clean JSON: ```console -./lightpanda agent hn_login.lp +./lightpanda agent hn_login.js > stories.json ``` -No `--provider`, no LLM, no token spend. The recorded script runs -top to bottom against a fresh browser. This is the form you want for -regression tests and CI: it's pure replay. +`/login` and `/acceptCookies` are REPL-only LLM triggers. A pure +recording from `-i` never contains them; the recorder captures the +resulting browser tool calls instead. Lines that are neither slash +commands nor comments are also REPL-only conveniences, not script +syntax. -`/login` and `/acceptCookies` are the only script entries that require -an LLM. A pure recording from `-i` never contains them (the recorder -captures the LLM's resulting tool calls, not the trigger), so it always -replays without `--provider`. Natural-language lines aren't valid -PandaScript syntax in the first place — they're a REPL-only convenience. +## 6. Local JavaScript logic -## 6. Selector drift and `--self-heal` +Agent scripts run in a separate JavaScript context from the web page. +There is no `window`, `document`, DOM API, `require`, `process`, or +Node standard library in that context. Browser interaction happens +through the installed primitives, and `extract(...)` is the usual way +to move page data into local script logic. -Real pages change. Simulate selector drift by editing your copy: +For example, turn the extraction result into a smaller report without +running any page-side JavaScript: -```console -cp hn_login.lp hn_broken.lp -sed -i 's/input\[name="acct"\]/input[name="user"]/' hn_broken.lp +```js +goto("https://news.ycombinator.com"); + +const topStories = extract({ + topStories: [{ + selector: ".athing", + limit: 5, + fields: { + rank: ".rank", + title: ".titleline > a", + url: { selector: ".titleline > a", attr: "href" } + } + }] +}); + +const report = topStories.map((story) => ({ + rank: story.rank, + title: story.title, + url: story.url +})); + +console.log(JSON.stringify({ report }, null, 2)); ``` -`input[name="user"]` doesn't exist on HN's login form, so a plain -replay fails: +Use the `eval(...)` primitive only when you intentionally want to run a +string in the current page's JavaScript context. Page eval cannot see +agent variables or call `goto`, `extract`, and the other agent +primitives. -```console -./lightpanda agent hn_broken.lp -``` - -`/fill`, `/setChecked`, and `/selectOption` go a step further than just -"did the selector resolve" — a post-exec verifier checks that the DOM -actually reflects the intent (the input ended up with the value you -typed, the checkbox flipped, the option got selected). That's what -catches silent drift before it propagates. - -Now re-run with self-heal: - -```console -./lightpanda agent --self-heal --provider anthropic hn_broken.lp -``` - -(Substitute your provider.) On failure, the agent runs a short, -budget-capped LLM turn against the *current* page state, gets a -replacement command, runs it, and atomically rewrites `hn_broken.lp` -in place. A `hn_broken.lp.bak` is written before any mutation, and -the rewritten line is prefixed with a header: - -```pandascript -# [Auto-healed] Original: /fill selector='form[action="login"] input[name="user"]' value='$LP_USERNAME' -/fill selector='form[action="login"] input[name="acct"]' value='$LP_USERNAME' -``` - -Self-heal is intentionally narrow: one replacement per failure, no -navigation, capped budget. It's there to recover from selector -drift, not to redesign the script. - -Re-run without `--self-heal` to prove replay is back to deterministic: - -```console -./lightpanda agent hn_broken.lp -``` - -## 7. Same script, external agent (MCP) +## 7. Same flow, external agent (MCP) Everything above used Lightpanda's built-in agent. If you're driving Lightpanda from a different agent (Claude Code, a custom MCP client, @@ -406,31 +426,36 @@ Register the server with your MCP client: From the external agent, call: -1. `recordStart { "path": "hn.lp" }` — begins appending state-mutating - tool calls to `hn.lp`. The path must be relative and free of `..`. +1. `recordStart { "path": "hn_login.js" }` — begins appending + state-mutating tool calls as JavaScript to `hn_login.js`. The path + must be relative and free of `..`. 2. The same browser tools you'd call anyway: `goto`, `fill`, `click`, - `waitForSelector`. Each one that succeeds is appended verbatim; + `waitForSelector`. Each one that succeeds is appended as a + JavaScript primitive call; query-only tools (`tree`, `markdown`, `findElement`, `consoleLogs`) are never recorded. -3. `recordComment { "text": "logged in" }` — drop a breadcrumb above +3. `recordComment { "text": "logged in" }` — drop a `//` breadcrumb above the next recorded line. Useful for marking the boundary between LLM-driven phases. 4. `recordStop {}` — closes the recording and returns `{path, line_count}`. -The output file is byte-equivalent to what `-i hn.lp` produced in -section 4. It replays via the agent CLI without modification: +The output file uses the same JavaScript format as `-i hn_login.js` +from section 4. It runs via the agent CLI without modification: ```console -./lightpanda agent hn.lp +./lightpanda agent hn_login.js ``` -### Replay with self-heal over MCP +### About `scriptStep` and `scriptHeal` -MCP doesn't carry a `--self-heal` flag — self-heal is a two-tool -roundtrip the calling agent orchestrates: +`recordStart` now records JavaScript agent scripts. The MCP +`scriptStep` and `scriptHeal` tools are still available for external +agents that want to run and repair one slash-command line at a time, +but that is a separate PandaScript line-healing workflow: -1. Read the script. For each non-blank, non-comment line, call +1. Read the PandaScript file you are healing. For each non-blank, + non-comment line, call `scriptStep { "line": "" }`. Comments and blanks are no-ops on the Lightpanda side. 2. On `isError: true`, the structured error message tells you what @@ -442,7 +467,7 @@ roundtrip the calling agent orchestrates: Each `original_line` must match verbatim. Lightpanda writes `.bak` first, then atomically rewrites the file with the `# [Auto-healed] Original: …` header prepended to the first - replacement — same format as section 6. + replacement. 4. Continue from the next line. `scriptStep` deliberately does *not* auto-record: the script is @@ -453,9 +478,10 @@ the caller's responsibility. ## Where to go next -- [agent.md](agent.md) — full reference: every flag, every PandaScript +- [agent.md](agent.md) — full reference: every flag, every slash command, every browser tool, plus the security model and auto-detection rules. -- [`hn_login.lp`](../hn_login.lp) — the fixture this tutorial builds. +- [agent-script.md](agent-script.md) — JavaScript runtime, primitives, + return values, and complete script examples. - `lightpanda mcp --help` and `lightpanda agent --help` — current flag listings straight from the binary. diff --git a/docs/agent.md b/docs/agent.md index b5945371..1363133b 100644 --- a/docs/agent.md +++ b/docs/agent.md @@ -2,15 +2,19 @@ > Looking for a step-by-step walkthrough instead of a reference? > See [agent-tutorial.md](agent-tutorial.md) — it builds one end-to-end -> Hacker News scenario covering the REPL, recording, replay, -> `--self-heal`, and the MCP roundtrip. +> Hacker News scenario covering the REPL, recording, replay, and the MCP +> roundtrip. +> +> Looking for the JavaScript script format? +> See [agent-script.md](agent-script.md) for the runtime contract and +> primitive API. `lightpanda agent` runs a browsing agent backed by Lightpanda's headless engine. It can act as: - an **LLM agent** that drives the browser with tool calls (`--provider`), -- a **scripted runner** that replays a `.lp` script deterministically, -- a **basic REPL** for hand-driven PandaScript with no LLM at all, +- a **scripted runner** that runs a recorded `.js` script deterministically, +- a **basic REPL** for hand-driven slash commands with no LLM at all, - a **one-shot task runner** that prints a single answer to stdout (`--task`). All four modes share the same browser tools (`goto`, `click`, `fill`, `tree`, @@ -28,14 +32,14 @@ etc.) without giving Lightpanda its own API key. # Force a specific provider ./lightpanda agent --provider anthropic -# Basic REPL (no LLM, PandaScript only) +# Basic REPL (no LLM, slash commands only) ./lightpanda agent --no-llm -# Replay a recorded script -./lightpanda agent session.lp +# Run a recorded script +./lightpanda agent session.js # Replay then continue interactively, appending new commands to the file -./lightpanda agent -i session.lp +./lightpanda agent -i session.js # One-shot: ask a question, capture the answer on stdout ./lightpanda agent --task "what is on the front page of hn?" @@ -66,8 +70,8 @@ one-line notice (on stderr) of what it chose: 2. **Auto-detected** → otherwise the first key found in priority order (`ANTHROPIC_API_KEY` → `GOOGLE_API_KEY`/`GEMINI_API_KEY` → `OPENAI_API_KEY`). Switch any time with `/provider` in the REPL, or override with `--provider`. -3. **No keys set** → falls back to the basic REPL (PandaScript only). Natural - language, `/login`, `/acceptCookies`, and `--self-heal` will reject. +3. **No keys set** → falls back to the basic REPL (slash commands only). + Natural language, `/login`, and `/acceptCookies` will reject. Ollama is never auto-detected (no env var to look at) — pass `--provider ollama`, or select it once with `/provider ollama` and it'll be remembered. @@ -77,11 +81,12 @@ API key is present or `--provider` is set. Use it to test PandaScript without burning tokens, or to disable the LLM in a saved command without editing the existing flags. `--no-llm` wins over `--provider`. -## PandaScript +## REPL Slash Commands -PandaScript is a tiny, line-oriented DSL for browser actions. Each line is a -slash command (`/ [args]`), a `#` comment, or blank. There is no other -syntax: anything that doesn't match those three forms is a parse error. +The REPL uses a tiny slash-command language for browser actions. Each command is +`/ [args]`, a `#` comment, or blank. There is no other syntax in basic +REPL mode or MCP `scriptStep`: anything that doesn't match those three forms is +a parse error. Slash commands accept any of: @@ -94,11 +99,12 @@ Slash commands accept any of: Tools whose selector is optional (e.g. `/click`, `/hover`, `/findElement`) have zero required fields, so they don't take a positional and must be -written as `key=value`: `/click selector='Login'`, not `/click 'Login'`. +written as `key=value`: `/click selector='a.login'`, not `/click 'a.login'`. Quoting is content-aware: `'…'`, `"…"`, and triple-quoted `'''…'''` / `"""…"""` for values that mix both quote styles or span multiple lines. -Recorded scripts round-trip through the parser without escapes. +Recorded JavaScript scripts use the equivalent function-call form instead of +slash lines. Two slash commands have no underlying tool — they trigger an LLM turn that the agent translates into actual tool calls: @@ -112,8 +118,8 @@ Both require an LLM. `--no-llm` rejects them. In the REPL (and only the REPL), a line that isn't a slash command and doesn't start with `#` is sent to the LLM as a natural-language prompt. In -`.lp` scripts and through MCP `scriptStep`, the same input is a parse -error. To leave the REPL, use the `/quit` meta command. +MCP `scriptStep`, the same input is a parse error. To leave the REPL, use the +`/quit` meta command. ### Example script @@ -249,32 +255,42 @@ is now origin-scoped and persists across navigations within a session). ### Recording -Interactive sessions can write back to a `.lp` file: +Interactive sessions can write back to a `.js` file: ```console -./lightpanda agent -i session.lp +./lightpanda agent -i session.js ``` State-mutating commands (`/goto`, `/click`, `/fill`, `/scroll`, `/hover`, `/selectOption`, `/setChecked`, `/waitForSelector`, `/press`, `/eval`, `/extract`) are appended; read-only commands (`/tree`, `/markdown`, `/links`, `/findElement`, …) and the natural-language turns that produced -them are not. Natural-language turns are recorded as `# ` comments -above the resulting slash commands so the script stays readable. +them are not. Natural-language turns are recorded as `// ` comments +above the resulting JavaScript calls so the script stays readable. -### Replay and self-healing +### JavaScript Script Running -`./lightpanda agent script.lp` replays without making any LLM call. +`./lightpanda agent script.js` runs without making any LLM call. Agent scripts +are plain synchronous JavaScript plus the installed Lightpanda primitives: -With `--self-heal --provider

`, a failed command (typically a stale -selector after the page changed) triggers a short LLM turn that inspects the -current page and emits a replacement command. The healed command runs, and -the original script line is rewritten in place so the next replay succeeds -deterministically. +```js +goto("https://example.com"); +click({ selector: "a.login" }); +eval("document.title"); +``` -Self-heal is constrained: at most one replacement per failure, capped LLM -budget, no navigation away from the current page. It is meant to recover -from selector drift, not to redesign the script. +The script runs in an agent-only V8 context. It has no `window`, `document`, or +DOM APIs. Browser interaction happens only through the installed primitives +(`goto`, `click`, `fill`, `eval`, `extract`, and the other recorded browser +actions). It is not Node.js either: there is no `require`, `process`, `fs`, npm +package loading, or Node standard library. The `eval(...)` primitive executes +its string in the current page context; page scripts cannot see agent variables +or agent primitives. + +Tool errors throw JavaScript exceptions and stop execution. `--self-heal` is not +available for JavaScript agent scripts; MCP `scriptStep`/`scriptHeal` remains +available for external agents that still want a PandaScript line-healing loop. +See [agent-script.md](agent-script.md) for the full script format reference. ## REPL features @@ -344,23 +360,22 @@ For sub-task delegation in the other direction — calling Lightpanda's own LLM-driven agent in a one-shot fashion — use `--task` on stdin instead. -### Recording PandaScript over MCP +### Recording JavaScript over MCP `lightpanda mcp` exposes three recording tools so an external agent can -capture a session as a `.lp` script for later deterministic replay: +capture a session as a `.js` script for later deterministic replay: | Tool | Args | Effect | |-----------------|-----------------------|------------------------------------------------------------------------------------------------| | `recordStart` | `{ path: string }` | Begin appending state-mutating tool calls to `path` (relative, no `..`). Errors if already on. | | `recordStop` | `{}` | Close the recording and return `{path, line_count}`. Errors if no recording is active. | -| `recordComment` | `{ text: string }` | Write `# ` to the active recording — useful as a breadcrumb above LLM-driven steps. | +| `recordComment` | `{ text: string }` | Write `// ` to the active recording — useful as a breadcrumb above LLM-driven steps. | While recording is active, every `goto` / `click` / `fill` / `scroll` / `hover` / `selectOption` / `setChecked` / `waitForSelector` / `eval` -that succeeds is appended verbatim. Query-only tools (`tree`, +that succeeds is appended as JavaScript. Query-only tools (`tree`, `markdown`, `findElement`, `consoleLogs`, …) are not recorded. The -resulting file replays without an LLM via `./lightpanda agent -session.lp`. +resulting file runs without an LLM via `./lightpanda agent session.js`. ### Replay + self-heal over MCP diff --git a/src/Config.zig b/src/Config.zig index c39b594c..da493ea5 100644 --- a/src/Config.zig +++ b/src/Config.zig @@ -229,7 +229,6 @@ const Commands = cli.Builder(.{ .{ .name = "model", .type = ?[:0]const u8 }, .{ .name = "base_url", .type = ?[:0]const u8 }, .{ .name = "system_prompt", .type = ?[:0]const u8 }, - .{ .name = "self_heal", .type = bool }, .{ .name = "interactive", .short = 'i', .type = bool }, .{ .name = "task", .type = ?[]const u8 }, .{ .name = "attach", .short = 'a', .type = []const u8, .multiple = true }, diff --git a/src/agent/Agent.zig b/src/agent/Agent.zig index b6036585..bcdedc9a 100644 --- a/src/agent/Agent.zig +++ b/src/agent/Agent.zig @@ -25,17 +25,16 @@ const ProviderTool = zenai.provider.Tool; const log = lp.log; const Config = lp.Config; -const script = lp.script; -const Command = lp.script.Command; -const Schema = lp.script.Schema; -const Recorder = lp.script.Recorder; -const Verifier = lp.script.Verifier; +const Command = lp.Command; +const Schema = lp.Schema; +const Recorder = lp.Recorder; const Credentials = zenai.provider.Credentials; const App = @import("../App.zig"); const CDPNode = @import("../cdp/Node.zig"); const Terminal = @import("Terminal.zig"); const SlashCommand = @import("SlashCommand.zig"); +const ScriptRuntime = @import("ScriptRuntime.zig"); const settings = @import("settings.zig"); const truncateUtf8 = @import("../string.zig").truncateUtf8; @@ -57,7 +56,7 @@ pub fn isUserError(err: anyerror) bool { return false; } -const default_system_prompt = script.driver_guidance ++ +const default_system_prompt = browser_tools.driver_guidance ++ \\ \\Agent-specific behavior: \\- Call a tool for every browser action. NEVER claim you performed an @@ -75,31 +74,6 @@ const default_system_prompt = script.driver_guidance ++ \\ the Credentials section above) before reporting unavailable. ; -const self_heal_prompt_prefix = - \\A PandaScript command failed during replay. The command that failed was: - \\ -; - -const self_heal_prompt_page_state = - \\ - \\The current page URL is: - \\ -; - -const self_heal_prompt_instructions = - \\ - \\IMPORTANT: - \\- Do NOT navigate away from the current page. The page is already loaded and - \\ contains the element you need — the selector just needs to be fixed. - \\- Use the tree or interactiveElements tools WITHOUT a url parameter to inspect - \\ the current page, find the correct selector, and execute the equivalent action. - \\- If the action is blocked by a popup, cookie banner, or surprise modal, - \\ handle it first (e.g., click "Accept") before executing the fixed command. - \\- ONLY fix the failed command and handle immediate blockers. STOP immediately - \\ once the intent of the original command is achieved. - \\ The script will continue executing the remaining commands after the heal. -; - const synthesis_prompt = \\You have used your tool budget or cannot finish the exploration. \\Give your best final answer NOW based ONLY on what you actually observed @@ -127,16 +101,16 @@ browser: lp.Browser, session: *lp.Session, node_registry: CDPNode.Registry, terminal: Terminal, -verifier: Verifier, recorder: ?Recorder, save_buffer: Recorder.Memory, save_path: ?[]u8, +script_runtime_mutex: std.Thread.Mutex = .{}, +active_script_runtime: ?*ScriptRuntime = null, messages: std.ArrayList(zenai.provider.Message), message_arena: std.heap.ArenaAllocator, model: []u8, system_prompt: []const u8, script_file: ?[]const u8, -self_heal: bool, interactive: bool, one_shot_task: ?[]const u8, one_shot_attachments: ?[]const []const u8, @@ -161,12 +135,6 @@ pub fn init(allocator: std.mem.Allocator, app: *App, opts: Config.Agent) !*Agent }); return error.ConflictingFlags; } - if (opts.self_heal and opts.script_file == null) { - log.fatal(.app, "self-heal needs a script", .{ - .hint = "--self-heal rewrites a recorded .lp on drift; pass a script path", - }); - return error.ConflictingFlags; - } if (opts.no_llm and opts.provider != null) { log.warn(.app, "ignoring --provider", .{ .reason = "--no-llm takes precedence" }); } @@ -179,12 +147,12 @@ pub fn init(allocator: std.mem.Allocator, app: *App, opts: Config.Agent) !*Agent // Basic-mode REPL (no LLM) must be opted into via --no-llm. Without it, // the REPL accepts natural language and an absent API key would only - // surface at the first non-PandaScript line — too late to be useful. - // Pure replay (`agent