Commit Graph

6522 Commits

Author SHA1 Message Date
Adrià Arrufat
5316dffeb7 Merge branch 'main' into agent 2026-05-13 14:58:47 +02:00
Adrià Arrufat
71824087e9 refactor: improve tool error logging and outcome handling
Adds debug logging for browser tool action and parsing errors.
Introduces a helper to deduplicate evaluation result handling in MCP.
2026-05-13 14:41:45 +02:00
Pierre Tachoire
12971a2420 Merge pull request #2445 from lightpanda-io/reset-bc-arena
cdp: reset browser context arena when bc is removed
2026-05-13 14:35:38 +02:00
Adrià Arrufat
38c74ab3b3 browser: extract formatJsError helper in tools.zig 2026-05-13 14:35:31 +02:00
Adrià Arrufat
48b8c4fb8c script: simplify formatHealReplacement 2026-05-13 14:34:12 +02:00
Adrià Arrufat
c009b291cb agent: correctly report JS errors in eval tool
Routes the eval tool through callEval instead of callValue. This
ensures JavaScript execution errors are flagged with is_error=true,
allowing the model to self-correct.
2026-05-13 14:33:11 +02:00
Adrià Arrufat
6538a34f56 browser: separate operational and JS errors in eval
Changes `EvalResult` to a union and uses Zig errors for operational
failures, distinguishing them from JavaScript execution errors.
Also removes redundant `stringifyJson` in favor of standard library.
2026-05-13 14:27:28 +02:00
Adrià Arrufat
635546bf85 Recorder: simplify initialization and file opening 2026-05-13 14:03:18 +02:00
Adrià Arrufat
fb06e282c2 agent: simplify promptNumberedChoice API\ 2026-05-13 13:29:26 +02:00
Adrià Arrufat
b2454a9190 chore: add license headers to source files 2026-05-13 13:20:36 +02:00
Adrià Arrufat
b4b533d2be agent: derive Action enum from tool_defs and clean up Verifier
- Derives the `Action` enum in `src/browser/tools.zig` from `tool_defs` to prevent manual maintenance and potential drift.
- Refactors `verifyElementValue` in `src/script/Verifier.zig` to use a `Check` struct for its parameters.
- Reorganizes and cleans up imports in `src/agent/Agent.zig`.
2026-05-13 13:18:30 +02:00
Adrià Arrufat
f35c4219c9 refactor: improve error handling and rename Self to Agent
- Replace `unreachable` with explicit error returns in Agent and Executor.
- Add `EvalResult.err` helper in `tools.zig` to simplify error paths.
- Improve error propagation in `SlashCommand` and `Recorder`.
- Rename `Self` to `Agent` for better clarity in function signatures.
2026-05-13 13:15:44 +02:00
Adrià Arrufat
6478853e93 agent: simplify zig syntax and arena usage 2026-05-13 13:01:30 +02:00
Adrià Arrufat
c374f8b04e browser: unify element identification with ActionTarget
Introduces an `ActionTarget` union to handle selectors and backend node
IDs consistently. This simplifies the `formatActionResult` function and
ensures the identification method is preserved within `NodeAndPage`
during element resolution.
2026-05-13 12:54:18 +02:00
Adrià Arrufat
cc59dd64b9 script: make ScriptIterator.next fallible 2026-05-13 12:40:07 +02:00
Adrià Arrufat
18890ea695 spinner: reset state on thread spawn failure 2026-05-13 12:25:32 +02:00
Adrià Arrufat
9f5814a431 agent: add errdefers for cleanup in ToolExecutor.init 2026-05-13 12:23:39 +02:00
Adrià Arrufat
457c565df3 agent: move ai_client check earlier in runHealTurn
Ensures the function fails fast if no AI client is available before
performing other operations.
2026-05-13 12:22:10 +02:00
Adrià Arrufat
96fce3c56f refactor: unify tool UI and browser action finalization
- Add `beginTool` and `endTool` to `Terminal` to encapsulate spinner logic.
- Consolidate navigation awaiting and context tagging in browser tools
  via a new `finalizeAction` helper.
2026-05-13 12:10:07 +02:00
Karl Seguin
e895ce81e3 Merge pull request #2437 from lightpanda-io/window_frameElement
Add window.frameElement
2026-05-13 18:00:08 +08:00
Karl Seguin
3e31fde66c Merge pull request #2443 from lightpanda-io/url_fixes
Fix URLSearchParams constructor
2026-05-13 17:59:50 +08:00
Adrià Arrufat
b23eb8a51a agent: handle user cancellation in interactive prompts 2026-05-13 11:51:36 +02:00
Karl Seguin
afc0942655 Merge pull request #2441 from lightpanda-io/fix-robots-crash
Fix crash on `robots.txt` being fulfilled synchronously
2026-05-13 17:39:22 +08:00
Pierre Tachoire
36b55339cd cdp: reset browser context arena when bc is removed 2026-05-13 11:26:09 +02:00
Adrià Arrufat
900b8be10d script: propagate allocation errors in command conversion 2026-05-13 11:06:10 +02:00
Adrià Arrufat
bbe3e58724 tools: extract page context formatting to helper
Moves page URL and title formatting into a separate `appendPageContext`
function and removes the page parameter from `formatActionResult`.
This simplifies tool execution logic and decouples action descriptions
from page metadata.
2026-05-13 10:55:05 +02:00
Adrià Arrufat
f624077218 refactor: propagate OOM in substituteEnvVars and inline heal header
- Update `substituteEnvVars` to return `OutOfMemory` error instead of
  silently returning the original string on failure.
- Simplify `execFill` to always use the original text for display.
- Inline the `writeHealHeader` helper in `src/script.zig`.
2026-05-13 10:31:16 +02:00
Adrià Arrufat
ee2964fb0f agent.spinner: handle thread spawn failure 2026-05-13 10:15:33 +02:00
Adrià Arrufat
7e1d0529a8 script: support triple-double quotes in command formatting 2026-05-13 10:12:43 +02:00
Adrià Arrufat
947e9be5fc script: add test for healing multi-line EVAL blocks 2026-05-13 10:03:32 +02:00
Adrià Arrufat
dbd0197576 cli: validate conflicting flags and add security warning
Adds validation logic to catch conflicting or missing CLI flags, such as
preventing `--task` with positional scripts and requiring a script for
`--self-heal`. Also adds a security warning to the help text regarding
arbitrary JavaScript execution in `.lp` files.
2026-05-13 09:58:26 +02:00
Adrià Arrufat
3293977f58 fix: resolve memory leak and deinit order issues
- Fix potential memory leak in `ScriptIterator` using `defer`.
- Correct `deinit` order in tests to prevent use-after-free.
- Use case-sensitive prefix check for `LP_` environment variables.
- Reorder fields in `CommandExecutor` for consistency.
2026-05-13 09:52:14 +02:00
Karl Seguin
6d58af350d Flag functions and accessors as DontEnum by default
Only `own_properties`, e.g. window.CSS should be enumerable.
2026-05-13 15:49:31 +08:00
Adrià Arrufat
303c5eefce agent: document schema-based EXTRACT command 2026-05-13 09:30:32 +02:00
Adrià Arrufat
0bbb77f292 agent.tutorial: document structured EXTRACT schema
Updates the agent tutorial to explain the JSON schema grammar for the
EXTRACT command, including selectors, attributes, and nested fields.
Adds examples for multi-line input and clarifies recording behavior.
2026-05-13 09:20:27 +02:00
Adrià Arrufat
15f06474cf refactor(browser): move console log buffering to Session
Moves the console message buffer from Frame to Session and populates it
via the notification system. This centralizes log collection for the
MCP tool and simplifies the Console Web API implementation.
2026-05-13 08:57:44 +02:00
Adrià Arrufat
71e39e9df3 Merge branch 'main' into agent 2026-05-13 08:48:41 +02:00
Karl Seguin
cc4ad53661 Fix URLSearchParams constructor
First, KeyValueList.fromJsObject now only iterates own properties. Second
URLSearchParams can now be constructed with another URLSearchParams. This is
a stopgap. The correct solution is for it to accept any iterator, but as a
quick fix for known cases (airbnb.com), this will help.
2026-05-13 14:38:43 +08:00
Pierre Tachoire
854eb6a62d Merge pull request #2339 from lightpanda-io/cdp-console
cdp: implement Console
2026-05-13 08:28:01 +02:00
Muki Kiboigo
4a45b4d866 fix crash on robots.txt request fufilled immediately 2026-05-12 21:50:05 -07:00
Karl Seguin
bd4f4c89e1 Merge pull request #2440 from staylor/scott/fix-worker-context-exit-with-proxy
Add LP.configureLoading worker + --disable-workers opt-out for Web Worker loading
2026-05-13 12:29:43 +08:00
Karl Seguin
10a5597aba Merge pull request #2435 from navidemad/fix-b12-htmldialogelement-methods
dom: implement HTMLDialogElement.{show, showModal, close}
2026-05-13 12:17:20 +08:00
muki
cc927c98ec Merge pull request #2424 from lightpanda-io/nix-wpt-run
Ability to run wpt with Nix/NixOS
2026-05-12 20:58:39 -07:00
Muki Kiboigo
06c2474376 use commit sha instead of branch name 2026-05-12 20:58:13 -07:00
Karl Seguin
393141e472 pass arena into handlers (consistent with other handlers) 2026-05-13 11:51:59 +08:00
Scott Taylor
b2998470c2 Add --disable-workers + LP.configureLoading worker opt-out
Adds two ways to opt out of dedicated Web Worker loading entirely. The
Worker constructor still returns a Worker object so calling pages don't
throw, but no script fetch is initiated and the worker scope's eval
never runs (postMessage from the page is queued indefinitely with no
handler to drain it).

* CDP method LP.configureLoading { worker: bool } -- per-session
  toggleable at runtime, alongside the existing { subFrame: bool }.
  Both fields are now optional so callers can flip one without
  resetting the other to its default. Backwards-compatible.
* CLI flag --disable-workers -- process-wide default applying to every
  session and to the fetch subcommand. Operators can flip it on without
  any driver changes. Mirrors --disable-subframes (#2401) exactly.

## Motivation

Reliably-reproducible SIGABRT in Worker.loadInitialScript whenever a
page constructs a Web Worker AND lightpanda is launched with
--http_proxy. Crash signature:

    $msg="V8 fatal callback" location=v8::Context::Exit()
    message="Cannot exit non-entered context"
    Stack:
      _browser.webapi.Worker.loadInitialScript
      _browser.webapi.Worker.httpDoneCallback
      _network.layer.InterceptionLayer.InterceptContext.doneCallback
      _browser.HttpClient.processMessages
      _browser.HttpClient.perform
      _browser.HttpClient.tick

The Zig-side Enter/Exit pair around the worker's eval doesn't match
v8's entered_contexts stack invariant under that timing -- something
upstream of the loadInitialScript Exit leaves an extra Enter on the
stack, so v8's Utils::ApiCheck (`isolate->context() == *env`) fires
and the process aborts.

Reproducible against any Shopify storefront PDP (e.g.
https://weareallbirds.myshopify.com/products/mens-wool-runners) when
served through any HTTP proxy -- the proxy just adds enough latency
to surface the race; the same code path runs without --http_proxy
but the timing window is too tight to reliably hit. The Allbirds
trigger script is the Shopify web-pixel-extension worker, but ANY
Worker the page constructs hits the same code path.

The proper fix needs the v8 entered-contexts invariant to be
restored end-to-end through the worker eval. That's a deeper dig
into how Worker.loadInitialScript / WorkerGlobalScope.importScript /
ls.local.runMacrotasks compose with v8's microtask queues across
multiple contexts; I tried three intermediate fixes (deferring
loadInitialScript via the frame scheduler when other scripts are
mid-eval, replacing the post-eval cross-context runMacrotasks with
worker-only PerformCheckpoint, and removing runMacrotasks entirely)
and none stopped the crash. The bug is fired from inside the
synchronous tick path before the post-eval microtask handling
runs, which means the leak happens during Script::Run itself and
needs more targeted investigation.

This PR is the workaround so users hitting the SIGABRT on
storefront / analytics-heavy pages have a clean opt-in escape today.
For our use case (product catalog extraction) Workers carry no
extraction signal -- web-pixel sandboxes, analytics SDKs, marketing
tag pixels, etc. -- so disabling them removes a fragile code path
without any downside.

## Implementation

`Session.worker_loading_enabled: bool = true` -- default matches
existing behavior.

`Worker.init` short-circuits AFTER constructing the Worker /
WorkerGlobalScope / arena bookkeeping (so the JS `new Worker(url)`
expression doesn't throw):

    if (!session.worker_loading_enabled) {
        log.debug(.browser, "worker disabled", .{ .url = resolved_url });
        return self;
    }

Two ways to flip the flag, mirroring the --disable-subframes pattern:

1. LP.configureLoading { worker: bool } -- both subFrame and worker
   are now optional fields in the params struct, so existing callers
   passing only { subFrame } continue to work unchanged.
2. --disable-workers CLI flag -- added to CommonOptions (so it
   applies to serve, fetch, mcp). New Config.disableWorkers() getter;
   Session.init reads it as the initial value.

Total diff: +88 / -3 across 4 files (src/Config.zig,
src/browser/Session.zig, src/browser/webapi/Worker.zig,
src/cdp/domains/lp.zig).

## Verification

Reproducer pattern (puppeteer-core 24.42.0 + tiny CONNECT-tunnel
proxy on 127.0.0.1:9999, scripts in cdp-repros/):

  serve --host 127.0.0.1 --port 9222 --http_proxy http://127.0.0.1:9999
  serve --host 127.0.0.1 --port 9222 --http_proxy http://127.0.0.1:9999 --disable-workers

Driving https://weareallbirds.myshopify.com/products/mens-wool-runners:

  baseline (no --disable-workers): 5/5 SIGABRT in
    Worker.loadInitialScript with the v8 fatal callback above.

  with --disable-workers:           10/10 successful, returns full
    HTML (~1MB), no crash.

Test suite:

  make test  -> 637 of 637 tests passed (was 636/636 + new
    cdp.lp: configureLoading toggles subFrame and worker
    independently regression test).

  zig fmt --check ./*.zig ./**/*.zig  -> clean.

## Notes

* The CDP method is the same domain (LP.configureLoading) and same
  shape as --disable-subframes' driver-side opt-in, so existing
  Playwright / puppeteer integrations that already toggle
  subframes don't need a separate code path -- one CDP call can
  flip both.

* worker_loading_enabled = false does NOT remove Worker from the
  global namespace (so feature-detection like
  `if (typeof Worker !== 'undefined')` still reports true). It just
  makes constructed workers no-op. Pages that postMessage to a worker
  and wait for a response will hang on that promise forever (or
  until the page is torn down). For our extraction use case that's
  fine -- we control the worklist timeout anyway -- but it's worth
  noting if upstream wants to surface the disabled state more
  strongly (e.g. throw from postMessage, or remove the global
  entirely behind an even-stricter flag).

* Once the underlying v8 entered-contexts invariant is restored in
  Worker.loadInitialScript, this flag becomes a perf / sandboxing
  tool rather than a correctness workaround. Worth keeping anyway:
  blocking analytics / pixel workers is a reasonable thing to want.

## Related

* #2400 -- the iframe analog to this issue (subframe nav invalidates
  executionContextId); same workaround pattern.
* #2401 -- introduced --disable-subframes / LP.configureLoading
  { subFrame } that this PR mirrors exactly for workers.
2026-05-12 23:46:45 -04:00
Karl Seguin
6eb90b2920 Add window.frameElement 2026-05-13 10:36:39 +08:00
Karl Seguin
2e159aaf12 Merge pull request #2436 from lightpanda-io/reset_frees_tasks
on scheduler.reset, finalizer any remaining tasks
2026-05-13 09:06:02 +08:00
Karl Seguin
656e29476e on scheduler.reset, finalizer any remaining tasks 2026-05-13 08:46:13 +08:00
Karl Seguin
d833eaa2e3 Merge pull request #2420 from lightpanda-io/http_client
HttpClient Improvements
2026-05-13 08:15:16 +08:00