7.8 KiB
Agent mode
lightpanda agent runs a browsing agent backed by Lightpanda's headless engine.
It can act as:
- an LLM agent that drives the browser with tool calls (
--provider), - a scripted runner that replays a
.pandascript deterministically, - a dumb REPL for hand-driven Pandascript with no LLM at all,
- a one-shot task runner that prints a single answer to stdout (
--task).
All four modes share the same browser tools (goto, click, fill, tree,
markdown, search, ...). The same set is exposed over MCP via lightpanda mcp, so an agent script and an MCP client see the same surface.
Quick start
# Interactive REPL with an LLM
./lightpanda agent --provider anthropic
# Dumb REPL (no API key, Pandascript only)
./lightpanda agent
# Replay a recorded script
./lightpanda agent session.panda
# Replay then continue interactively, appending new commands to the file
./lightpanda agent -i session.panda
# One-shot: ask a question, capture the answer on stdout
./lightpanda agent --provider gemini --task "what is on the front page of hn?"
Providers and API keys
| Provider | Flag | API key env |
|---|---|---|
| Anthropic | --provider anthropic |
ANTHROPIC_API_KEY |
| OpenAI | --provider openai |
OPENAI_API_KEY |
| Gemini | --provider gemini |
GOOGLE_API_KEY or GEMINI_API_KEY |
| Ollama | --provider ollama |
none (local) |
Defaults: --model falls back to a sensible per-provider default; --base-url
overrides the API endpoint (Ollama defaults to http://localhost:11434/v1).
Without --provider, the REPL still works for Pandascript commands. Natural
language, LOGIN, ACCEPT_COOKIES, and --self-heal all require a provider.
Pandascript
Pandascript is a tiny, line-oriented DSL for browser actions. Each line is one
command. Comments start with #. Strings are quoted with ', ", or '''…'''
for values that mix both quote styles. Quoting rules are content-aware so that
recorded scripts round-trip through the parser.
| Command | Form | Notes |
|---|---|---|
GOTO |
GOTO <url> |
Navigate. URL is unquoted. |
CLICK |
CLICK '<selector>' |
CSS selector. |
TYPE |
TYPE '<selector>' '<value>' |
Fills an input. $LP_* env refs auto-resolve. |
WAIT |
WAIT '<selector>' |
Wait for selector. |
SCROLL |
SCROLL [x] [y] |
Default (0, 0). |
HOVER |
HOVER '<selector>' |
|
SELECT |
SELECT '<selector>' '<value>' |
<select> option by value. |
CHECK |
CHECK '<selector>' [true|false] |
Check / uncheck. Default true. |
EXTRACT |
EXTRACT '<selector>' |
Returns text content. |
EVAL |
EVAL '<js>' or EVAL '''…''' |
Triple-quote for multi-line JS. |
TREE |
TREE |
Print the semantic tree (not recorded). |
MARKDOWN |
MARKDOWN |
Print page as markdown (not recorded). |
LOGIN |
LOGIN |
LLM-driven: fill $LP_USERNAME / $LP_PASSWORD. |
ACCEPT_COOKIES |
ACCEPT_COOKIES |
LLM-driven: dismiss the consent banner. |
In the REPL, anything that does not parse as a Pandascript command is sent to
the LLM as natural language. To leave the REPL, use the /quit slash command.
Example script
# Log into the demo and grab the dashboard title.
GOTO https://demo-browser.lightpanda.io/
ACCEPT_COOKIES
TYPE '#email' '$LP_USERNAME'
TYPE '#password' '$LP_PASSWORD'
CLICK 'button[type="submit"]'
WAIT '.dashboard'
EXTRACT '.dashboard h1'
Recording
Interactive sessions can write back to a .panda file:
./lightpanda agent -i session.panda
State-mutating commands (GOTO, CLICK, TYPE, ...) are appended; read-only
commands (TREE, MARKDOWN) and the natural-language turns that produced
them are not. Natural-language turns are recorded as # <prompt> comments
above the resulting tool calls so the script stays readable.
Replay and self-healing
./lightpanda agent script.panda replays without making any LLM call.
With --self-heal --provider <p>, a failed command (typically a stale
selector after the page changed) triggers a short LLM turn that inspects the
current page and emits a replacement command. The healed command runs, and
the original script line is rewritten in place so the next replay succeeds
deterministically.
Self-heal is constrained: at most one replacement per failure, capped LLM budget, no navigation away from the current page. It is meant to recover from selector drift, not to redesign the script.
REPL features
- Tab completion: cycles through Pandascript keywords and
/<tool>slash commands. The dim grey suffix shown after the cursor is the first match; pressing Tab accepts it. - Persistent history: stored in
.lp-historyin the working directory. - Slash commands:
/<tool> [args]calls a browser tool directly without going through the LLM./helplists tools,/help <tool>prints the JSON schema. Args accept either a single positional value (for tools with one required field),key=valuepairs, or a raw{json}blob.> /goto https://example.com > /findElement role=button name=Submit > /eval {"script": "document.title"} - Stdout vs stderr: the final assistant answer and data-producing commands
(
EXTRACT,EVAL,MARKDOWN,TREE) write to stdout. Tool calls, progress, and errors go to stderr, solightpanda agent --task ... > out.txtcaptures a clean answer.
One-shot mode (--task)
./lightpanda agent --provider gemini \
--task "what is the top story on news.ycombinator.com?"
--task runs a single user turn, prints the final answer on stdout, and
exits. Combine with --task-attachment <path> (repeatable) to feed local
files to providers that accept attachments.
Browser tools
The agent and MCP server share the tool set defined in src/browser/tools.zig.
Highlights:
goto,search(Google with DuckDuckGo fallback on captcha)tree,markdown,links,interactiveElements,structuredData,detectForms,nodeDetails,findElementclick,fill,hover,press,scroll,selectOption,setChecked,waitForSelectoreval,consoleLogs,getUrl,getCookies,getEnv
Selectors prefer CSS over backendNodeId for the click-family tools, since
node IDs are invalidated by any DOM mutation. The system prompt enforces this
for the LLM.
Security notes
- The agent treats page content as untrusted data, not instructions. URLs surfaced by a page are not followed unless they match the user's task.
$LP_*environment variable references inTYPE/fillvalues are resolved at execution time, so credentials never enter the LLM context.--obey-robots,--http-proxy,--user-agent, and the rest of the browser-level CLI flags apply toagentthe same way they apply toserve,fetch, andmcp.