From cd830a7152b93d3db4ae8499af238cb39b23ba73 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Adri=C3=A0=20Arrufat?= Date: Tue, 12 May 2026 10:20:50 +0200 Subject: [PATCH] agent: add end-to-end tutorial --- docs/agent-tutorial.md | 413 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 413 insertions(+) create mode 100644 docs/agent-tutorial.md diff --git a/docs/agent-tutorial.md b/docs/agent-tutorial.md new file mode 100644 index 00000000..735c3b6f --- /dev/null +++ b/docs/agent-tutorial.md @@ -0,0 +1,413 @@ +# Agent tutorial — Hacker News, end-to-end + +This walks you from "I just built `./lightpanda`" to a recorded, +replayable, self-healing browser script — and then drives the same +script from an external MCP client. Every section ends with a command +you can run; nothing references later sections. + +For the flag/command/tool tables, see [agent.md](agent.md). This +document is the tutorial; that one is the reference. + +## What you'll build + +One session against Hacker News: + +1. Log in with your account. +2. Confirm the login by reading the username out of the header. +3. Record the whole flow to a `.lp` file. +4. Replay it offline, with no LLM. +5. Break a selector on purpose; watch `--self-heal` repair the file. +6. Drive the same script from an external agent over MCP. + +The finished artifact already exists in the repo as +[`hn_login.lp`](../hn_login.lp). Diff your recording against it at the +end as a sanity check. + +## Prerequisites + +- `./lightpanda` on your PATH (build with `zig build`). +- A Hacker News account. +- One LLM API key for sections that need natural language and + self-healing — Anthropic, OpenAI, Gemini, or a local Ollama. Sections + 4–7 work with no key at all. + +Export your HN credentials as `LP_*` env vars. The convention is +`LP__` — a short site identifier (`HN` for Hacker News, +`GH` for GitHub, …) lets you keep credentials for multiple sites in +your environment without collisions. The unprefixed `LP_USERNAME` / +`LP_PASSWORD` form is the generic fallback when you only have one +site. + +In **bash** or **zsh**: + +```console +export LP_HN_USERNAME="your-hn-handle" +export LP_HN_PASSWORD="your-hn-password" +``` + +In **fish**: + +```fish +set -gx LP_HN_USERNAME "your-hn-handle" +set -gx LP_HN_PASSWORD "your-hn-password" +``` + +The `LP_` prefix matters. The agent resolves `$LP_*` references +*inside* the Lightpanda subprocess, so the literal secret never enters +the LLM context; and the `getEnv` tool refuses to read anything that +doesn't start with `LP_`, so the model can't probe your other env +vars. + +Verify they're set before continuing — substitution fails silently if +a variable is missing (the literal `$LP_HN_USERNAME` ends up typed +into the form), and the `TYPE` confirmation message intentionally +echoes the placeholder name rather than the resolved value, so the +response text won't tell you. Confirm directly: + +```console +./lightpanda agent --no-llm +> /getEnv LP_HN_USERNAME +``` + +`/getEnv` returns the literal value if set, or "not set" if missing. +Only `$LP_*` references in fill values are substituted; other `$` +characters in your password (`my$ecret`, `$5.99`) are passed through +verbatim. + +## 1. First contact: the REPL + +```console +./lightpanda agent +``` + +On startup the agent prints a one-line notice telling you which mode it +landed in — which provider (and which env var won), or "basic REPL +(no LLM)" if no key is set. The REPL writes its history to +`.lp-history` in the working directory, so up-arrow works across runs. + +Try the meta commands: + +``` +> /help +> /help goto +> /quit +``` + +`/help` lists every browser tool. `/help ` prints its JSON +schema. `/quit` exits cleanly. If you have no API key yet and want to +poke around without an LLM, `./lightpanda agent --no-llm` forces the +basic REPL. + +## 2. The shortest possible win: `--task` + +Before doing anything complicated, prove the LLM + browser stack +works end-to-end: + +```console +./lightpanda agent --task "what is the top story on news.ycombinator.com?" +``` + +`--task` runs a single user turn, prints the final answer on stdout, +and exits. Tool calls, progress, and errors all go to stderr, so +redirecting stdout gives you a clean answer: + +```console +./lightpanda agent --task "top story on news.ycombinator.com?" > out.txt +``` + +If you need to feed the model a local file, repeat +`--task-attachment ` for each one. + +## 3. Driving the browser by hand + +Now back to the REPL. We'll write the HN login flow one command at a +time so you can see how each step depends on what the previous one +showed. + +``` +> GOTO https://news.ycombinator.com/login +``` + +`GOTO` takes an unquoted URL. The page is now loaded. + +> Commands must be uppercase. `click '#foo'` is forwarded to the LLM as +> natural language; only `CLICK '#foo'` runs as a command. TAB +> completion in the REPL fills in the caps for you — typing `cli` +> rewrites the line to `CLICK`. + +Inspect it before clicking anything: + +``` +> TREE +``` + +`TREE` prints the semantic tree to stdout. Two forms are visible — +the login form and the create-account form below it — and each one +contains two unlabeled textboxes: + +``` +8 form + 13 'username:' + 15 [i] textbox + 18 'password:' + 20 [i] + 22 [i] button 'login' value='login' +30 form + 35 'username:' + 37 [i] textbox + … +``` + +Notice the textboxes have no accessible name — "username:" is a +sibling text node, not a `