Commit Graph

6241 Commits

Author SHA1 Message Date
Muki Kiboigo
ac863c7e2b add Network.requestServedFromCache 2026-05-13 21:47:47 -07:00
Karl Seguin
373916873f Merge pull request #2442 from lightpanda-io/worker_message_buffer
CI fixes, callback timing correctness
2026-05-14 08:56:36 +08:00
Karl Seguin
96ac9a49ea Update src/browser/webapi/Worker.zig
Co-authored-by: Navid EMAD <navid.emad@yespark.fr>
2026-05-14 08:33:32 +08:00
Karl Seguin
1580ab197f Merge pull request #2452 from lightpanda-io/event_worker
make Event worker-safe
2026-05-14 07:39:37 +08:00
Karl Seguin
bcafa175cb make Event worker-safe 2026-05-14 07:11:33 +08:00
Karl Seguin
5595f7d298 Merge pull request #2448 from lightpanda-io/script_load_error_handling
Don't process scripts that failed to load
2026-05-13 23:19:40 +08:00
Pierre Tachoire
198c4e5a0f Merge pull request #2444 from lightpanda-io/useless-code
cdp: remove dead code
0.3.0
2026-05-13 15:36:16 +02:00
Pierre Tachoire
ffc2baa733 Merge pull request #2431 from lightpanda-io/cdp-double-frame-navigated-event
fix(cdp): remove duplicate Page.frameNavigated and fix context regist…
2026-05-13 15:17:27 +02:00
Karl Seguin
7750bc94f6 Apply suggestions from code review
Remove no-longer needed setTimeouts in test now that messages are queued. 

Runner also checks ready_queue when determining doneness.

Co-authored-by: Navid EMAD <design.navid@gmail.com>
2026-05-13 20:57:59 +08:00
Karl Seguin
2326071036 Don't [try] to process scripts that failed to load
At some point recently, we started to process scripts that fail to load (e.g.
404). This stops such scripts from [trying] to be evaluated, and executes the
onerror handler in all script loading paths.
2026-05-13 20:48:08 +08:00
Pierre Tachoire
12971a2420 Merge pull request #2445 from lightpanda-io/reset-bc-arena
cdp: reset browser context arena when bc is removed
2026-05-13 14:35:38 +02:00
Pierre Tachoire
5d73d82bf6 cdp: call context created w/ correct is_default_context value
Co-authored-by: Navid EMAD <navid.emad@yespark.fr>
2026-05-13 14:11:53 +02:00
Pierre Tachoire
8432cfbfba cdp: return error in case of missing event's frame
Instead of using the root_frame
2026-05-13 12:29:11 +02:00
Karl Seguin
e895ce81e3 Merge pull request #2437 from lightpanda-io/window_frameElement
Add window.frameElement
2026-05-13 18:00:08 +08:00
Karl Seguin
3e31fde66c Merge pull request #2443 from lightpanda-io/url_fixes
Fix URLSearchParams constructor
2026-05-13 17:59:50 +08:00
Karl Seguin
625e240f5a Pump the http_client queue after perform, not just before
Client.tick drains self.queue (assigning conns to queued transfers) only
at the start. When perform / processMessages releases a batch of conns
back to the pool, those conns sit idle until the next tick — a queued
transfer that could have run this tick waits one Runner iteration
(~20 ms in the test runner) for no reason. Adds a second drainQueue
call after perform so newly-freed conns get picked up immediately.

In practice this matters whenever httpMaxHostOpen / httpMaxConcurrent
is exceeded — pages with N > limit subresources had each "wave" of
queue overflow paying one extra tick of latency.
2026-05-13 17:58:49 +08:00
Karl Seguin
c79dd2bf1f Make runner aware of http_client.queue
When connections are queued, the processing cannot be considered done.
2026-05-13 17:55:39 +08:00
Karl Seguin
2bcf9a22d5 Disable cache=true from e2e matrix
cache=true is problematic for a few reasons

1 - The current cache implementation is known to cause timing issues; i.e. it
    executes callbacks synchronously.

2 - Unlike something like robots.txt or proxy, cache tests need to be explicitly
    tested. The response has to include cache headers and the resource loaded
    again.
2026-05-13 17:52:46 +08:00
Karl Seguin
afc0942655 Merge pull request #2441 from lightpanda-io/fix-robots-crash
Fix crash on `robots.txt` being fulfilled synchronously
2026-05-13 17:39:22 +08:00
Pierre Tachoire
36b55339cd cdp: reset browser context arena when bc is removed 2026-05-13 11:26:09 +02:00
Pierre Tachoire
403fe0d293 cdp: remove dead code 2026-05-13 11:18:05 +02:00
Karl Seguin
c860a9a9e5 Split xhr-in-worker tests into their own file
xhr.html can brush up against the timeout as we add more and more cases. This
is particularly true on the slow CI, in debug builds, with TSAN.
2026-05-13 15:59:29 +08:00
Karl Seguin
dd99102f4b Defer HTTP completion callbacks to next tick
Client.makeRequest used to call self.perform(0) after handing the transfer
to libcurl. That perform() does two things: drives curl_multi_perform (so
bytes hit the wire) AND drains curl_multi_info_read messages, which is
what fires the user-facing header/data/done callbacks.

The issue is that, even in non-cache cases, a request could be immediately
resolved in libcurl, and thus callbacks executed synchronously.

By only calling `curl_multi_perform` on a new request, we prevent this from
happening.
2026-05-13 15:59:29 +08:00
Karl Seguin
2fcad23834 Buffer worker postMessages received before script load completes 2026-05-13 15:59:29 +08:00
Karl Seguin
6d58af350d Flag functions and accessors as DontEnum by default
Only `own_properties`, e.g. window.CSS should be enumerable.
2026-05-13 15:49:31 +08:00
Karl Seguin
cc4ad53661 Fix URLSearchParams constructor
First, KeyValueList.fromJsObject now only iterates own properties. Second
URLSearchParams can now be constructed with another URLSearchParams. This is
a stopgap. The correct solution is for it to accept any iterator, but as a
quick fix for known cases (airbnb.com), this will help.
2026-05-13 14:38:43 +08:00
Pierre Tachoire
854eb6a62d Merge pull request #2339 from lightpanda-io/cdp-console
cdp: implement Console
2026-05-13 08:28:01 +02:00
Muki Kiboigo
4a45b4d866 fix crash on robots.txt request fufilled immediately 2026-05-12 21:50:05 -07:00
Karl Seguin
bd4f4c89e1 Merge pull request #2440 from staylor/scott/fix-worker-context-exit-with-proxy
Add LP.configureLoading worker + --disable-workers opt-out for Web Worker loading
2026-05-13 12:29:43 +08:00
Karl Seguin
10a5597aba Merge pull request #2435 from navidemad/fix-b12-htmldialogelement-methods
dom: implement HTMLDialogElement.{show, showModal, close}
2026-05-13 12:17:20 +08:00
muki
cc927c98ec Merge pull request #2424 from lightpanda-io/nix-wpt-run
Ability to run wpt with Nix/NixOS
2026-05-12 20:58:39 -07:00
Muki Kiboigo
06c2474376 use commit sha instead of branch name 2026-05-12 20:58:13 -07:00
Karl Seguin
393141e472 pass arena into handlers (consistent with other handlers) 2026-05-13 11:51:59 +08:00
Scott Taylor
b2998470c2 Add --disable-workers + LP.configureLoading worker opt-out
Adds two ways to opt out of dedicated Web Worker loading entirely. The
Worker constructor still returns a Worker object so calling pages don't
throw, but no script fetch is initiated and the worker scope's eval
never runs (postMessage from the page is queued indefinitely with no
handler to drain it).

* CDP method LP.configureLoading { worker: bool } -- per-session
  toggleable at runtime, alongside the existing { subFrame: bool }.
  Both fields are now optional so callers can flip one without
  resetting the other to its default. Backwards-compatible.
* CLI flag --disable-workers -- process-wide default applying to every
  session and to the fetch subcommand. Operators can flip it on without
  any driver changes. Mirrors --disable-subframes (#2401) exactly.

## Motivation

Reliably-reproducible SIGABRT in Worker.loadInitialScript whenever a
page constructs a Web Worker AND lightpanda is launched with
--http_proxy. Crash signature:

    $msg="V8 fatal callback" location=v8::Context::Exit()
    message="Cannot exit non-entered context"
    Stack:
      _browser.webapi.Worker.loadInitialScript
      _browser.webapi.Worker.httpDoneCallback
      _network.layer.InterceptionLayer.InterceptContext.doneCallback
      _browser.HttpClient.processMessages
      _browser.HttpClient.perform
      _browser.HttpClient.tick

The Zig-side Enter/Exit pair around the worker's eval doesn't match
v8's entered_contexts stack invariant under that timing -- something
upstream of the loadInitialScript Exit leaves an extra Enter on the
stack, so v8's Utils::ApiCheck (`isolate->context() == *env`) fires
and the process aborts.

Reproducible against any Shopify storefront PDP (e.g.
https://weareallbirds.myshopify.com/products/mens-wool-runners) when
served through any HTTP proxy -- the proxy just adds enough latency
to surface the race; the same code path runs without --http_proxy
but the timing window is too tight to reliably hit. The Allbirds
trigger script is the Shopify web-pixel-extension worker, but ANY
Worker the page constructs hits the same code path.

The proper fix needs the v8 entered-contexts invariant to be
restored end-to-end through the worker eval. That's a deeper dig
into how Worker.loadInitialScript / WorkerGlobalScope.importScript /
ls.local.runMacrotasks compose with v8's microtask queues across
multiple contexts; I tried three intermediate fixes (deferring
loadInitialScript via the frame scheduler when other scripts are
mid-eval, replacing the post-eval cross-context runMacrotasks with
worker-only PerformCheckpoint, and removing runMacrotasks entirely)
and none stopped the crash. The bug is fired from inside the
synchronous tick path before the post-eval microtask handling
runs, which means the leak happens during Script::Run itself and
needs more targeted investigation.

This PR is the workaround so users hitting the SIGABRT on
storefront / analytics-heavy pages have a clean opt-in escape today.
For our use case (product catalog extraction) Workers carry no
extraction signal -- web-pixel sandboxes, analytics SDKs, marketing
tag pixels, etc. -- so disabling them removes a fragile code path
without any downside.

## Implementation

`Session.worker_loading_enabled: bool = true` -- default matches
existing behavior.

`Worker.init` short-circuits AFTER constructing the Worker /
WorkerGlobalScope / arena bookkeeping (so the JS `new Worker(url)`
expression doesn't throw):

    if (!session.worker_loading_enabled) {
        log.debug(.browser, "worker disabled", .{ .url = resolved_url });
        return self;
    }

Two ways to flip the flag, mirroring the --disable-subframes pattern:

1. LP.configureLoading { worker: bool } -- both subFrame and worker
   are now optional fields in the params struct, so existing callers
   passing only { subFrame } continue to work unchanged.
2. --disable-workers CLI flag -- added to CommonOptions (so it
   applies to serve, fetch, mcp). New Config.disableWorkers() getter;
   Session.init reads it as the initial value.

Total diff: +88 / -3 across 4 files (src/Config.zig,
src/browser/Session.zig, src/browser/webapi/Worker.zig,
src/cdp/domains/lp.zig).

## Verification

Reproducer pattern (puppeteer-core 24.42.0 + tiny CONNECT-tunnel
proxy on 127.0.0.1:9999, scripts in cdp-repros/):

  serve --host 127.0.0.1 --port 9222 --http_proxy http://127.0.0.1:9999
  serve --host 127.0.0.1 --port 9222 --http_proxy http://127.0.0.1:9999 --disable-workers

Driving https://weareallbirds.myshopify.com/products/mens-wool-runners:

  baseline (no --disable-workers): 5/5 SIGABRT in
    Worker.loadInitialScript with the v8 fatal callback above.

  with --disable-workers:           10/10 successful, returns full
    HTML (~1MB), no crash.

Test suite:

  make test  -> 637 of 637 tests passed (was 636/636 + new
    cdp.lp: configureLoading toggles subFrame and worker
    independently regression test).

  zig fmt --check ./*.zig ./**/*.zig  -> clean.

## Notes

* The CDP method is the same domain (LP.configureLoading) and same
  shape as --disable-subframes' driver-side opt-in, so existing
  Playwright / puppeteer integrations that already toggle
  subframes don't need a separate code path -- one CDP call can
  flip both.

* worker_loading_enabled = false does NOT remove Worker from the
  global namespace (so feature-detection like
  `if (typeof Worker !== 'undefined')` still reports true). It just
  makes constructed workers no-op. Pages that postMessage to a worker
  and wait for a response will hang on that promise forever (or
  until the page is torn down). For our extraction use case that's
  fine -- we control the worklist timeout anyway -- but it's worth
  noting if upstream wants to surface the disabled state more
  strongly (e.g. throw from postMessage, or remove the global
  entirely behind an even-stricter flag).

* Once the underlying v8 entered-contexts invariant is restored in
  Worker.loadInitialScript, this flag becomes a perf / sandboxing
  tool rather than a correctness workaround. Worth keeping anyway:
  blocking analytics / pixel workers is a reasonable thing to want.

## Related

* #2400 -- the iframe analog to this issue (subframe nav invalidates
  executionContextId); same workaround pattern.
* #2401 -- introduced --disable-subframes / LP.configureLoading
  { subFrame } that this PR mirrors exactly for workers.
2026-05-12 23:46:45 -04:00
Karl Seguin
6eb90b2920 Add window.frameElement 2026-05-13 10:36:39 +08:00
Karl Seguin
2e159aaf12 Merge pull request #2436 from lightpanda-io/reset_frees_tasks
on scheduler.reset, finalizer any remaining tasks
2026-05-13 09:06:02 +08:00
Karl Seguin
656e29476e on scheduler.reset, finalizer any remaining tasks 2026-05-13 08:46:13 +08:00
Karl Seguin
d833eaa2e3 Merge pull request #2420 from lightpanda-io/http_client
HttpClient Improvements
2026-05-13 08:15:16 +08:00
Karl Seguin
50b126b402 fix cachelayer hit path 2026-05-13 07:14:44 +08:00
Navid EMAD
989e2d03a2 dom: implement HTMLDialogElement.{show, showModal, close}
The HTMLDialogElement constructor was exposed with `open` / `returnValue`
IDL accessors, but the three instance methods that drive the open/close
state were missing. Per HTML §4.11.4 (The dialog element), `show()` sets
the `open` attribute if absent; `showModal()` throws `InvalidStateError`
when the dialog is already open and otherwise sets `open`; `close()`
removes `open`, optionally updates `returnValue`, and fires a non-
bubbling `close` event. The non-rendering steps (focus trap, backdrop,
top-layer placement) are intentional no-ops here — `[open]` reflecting
through to selectors and the `close` event firing are what downstream
CDP clients rely on.

Closes #2434
2026-05-12 19:02:10 +02:00
Pierre Tachoire
4c58f8a6d0 Merge pull request #2432 from lightpanda-io/fix-build-banner-stderr
build: print version banner to stderr, not stdout
2026-05-12 16:25:40 +02:00
Adrià Arrufat
842fbb78ef build: print version banner to stderr, not stdout
The build script wrote the version line to stdout, polluting any pipeline
that captures program output via 'zig build run' or similar. Banners
belong on stderr.
2026-05-12 15:12:50 +02:00
Karl Seguin
edc3d836d1 explicit track if transfer is queued for correct cleanup 2026-05-12 20:32:40 +08:00
Pierre Tachoire
806497c02b fix(cdp): remove duplicate Page.frameNavigated and fix context registration for iframes
The frameNavigated CDP handler sent Page.frameNavigated twice per
navigation and always used the root frame's V8 context for
inspector.contextCreated, even during iframe navigations. This caused
the root frame's inspector context id to be silently re-registered
(and the previous id invalidated) whenever an iframe navigated.

The duplicate Page.frameNavigated masked this bug — Puppeteer happened
to pick up the re-registered context id. Removing the duplicate exposed
the underlying issue: callFunctionOn with the original id failed with
"Cannot find context with specified id".
2026-05-12 14:21:50 +02:00
Karl Seguin
caa47d50b2 fix build 2026-05-12 19:30:46 +08:00
Karl Seguin
73241dd1f7 don't mutate hashmap while iterating (in teardown) 2026-05-12 19:26:25 +08:00
Karl Seguin
5e0976bbd6 fix use-after-free on robotslayer shutdown 2026-05-12 19:26:24 +08:00
Karl Seguin
7869cbb68c Delay Page destruction to avoid UAF
Encapsulate resource ownership in a HttpClient.Owner
2026-05-12 19:26:24 +08:00
Karl Seguin
2dc3b4682b abort frame-specific transfers 2026-05-12 19:26:24 +08:00
Karl Seguin
82a4fc752b HttpClient Improvements
1 - Track owner of a request (for simpler / more accurate abort (TBD))

2 - Create Transfer upfront, make everything work on Transfer (not Request)
    This helps remove ambiguity about cleanup and simplifies layers. For example
    Robots request is just another normal request, not a special case. This gives
    everything a stable address (the *Transfer which can be looked up by id)
2026-05-12 19:26:24 +08:00