Commit Graph

4897 Commits

Author SHA1 Message Date
Pierre Tachoire
c23d0f4f35 cdp: implement webMCP domain 2026-05-15 08:50:46 +02:00
Pierre Tachoire
0023bd7d19 Add WebMCP navigator.modelContext
Implements the page-side surface of the W3C WebMCP spec
(https://webmachinelearning.github.io/webmcp/): exposes
`navigator.modelContext.registerTool(...)` for declaring MCP tools to a
browser agent, with full name/description validation, AbortSignal-based
unregistration, and a `ModelContextClient` whose `requestUserInteraction`
invokes its callback directly (closest faithful behavior in a headless
browser).
2026-05-15 08:50:38 +02:00
Karl Seguin
cb8c2bc4d8 Merge pull request #2456 from lightpanda-io/cdp-proper-cache-disable
properly disable cache on `Network.setCacheDisabled`
2026-05-15 09:11:12 +08:00
Karl Seguin
632f3ea7d6 Merge pull request #2457 from lightpanda-io/fetch_dump_navigate_fix
Dump using the latest Frame to prevent segfault during on frame change
2026-05-15 07:38:19 +08:00
Karl Seguin
b7a0ca2bca fallback unknown rule to new unknown type 2026-05-15 06:59:21 +08:00
Scott Taylor
6d1740b40f Surface at-rules through insertRule and replaceSync (fixes #2459)
CSSStyleSheet.insertRule previously detected at-rules in the parser
(has_skipped_at_rule) and returned the requested index without inserting
anything. The original change (PR #1972) did this to keep at-rule input
from killing module evaluation in apps like Expo Web -- correct, but the
silent-success contract has a second-order effect: CSS-in-JS libraries
(emotion, styled-components, Stitches, Mantine, Linaria) that round-trip
through cssRules to deduplicate their stylesheets see the rule as missing
after insertion, conclude the stylesheet is empty, and fall back to direct
<style> element injection per render. The result is unbounded <style>
accumulation on long-lived sessions doing repeated DOM interaction. See
issue #2459 for measurements (Allbirds-style PDPs accumulating thousands
of <style> elements over a render loop).

Changes:

* Parser.RulesIterator now returns a Rule union of {.style, .at_rule}
  instead of skipping at-rules and setting has_skipped_at_rule. The
  at-rule variant carries the keyword (without `@`) and the full source
  span so callers can construct an opaque placeholder rule.

* CSSRule gains a `_text` field and an `initAtRule` constructor for
  storing the at-rule source. CSSRule.getCssText returns the stored
  text (CSSStyleRule's overridden getCssText still wins for `.style`
  rules via the bridge dispatch on the most-derived class).

* CSSStyleSheet.insertRule and replaceSync handle both Rule variants:
  regular rules become CSSStyleRule as before, at-rules become opaque
  CSSRule placeholders with the matching CSSRule.Type variant
  (vendor-prefixed keyframes are normalized; unrecognized at-rule
  keywords fall back to .media). The CSS engine still doesn't apply
  these rules -- that's intentional and outside the scope of this
  change -- but they now surface via cssRules so library dedup paths
  work correctly.

* StyleManager.addRawRules switches on the Rule kind and skips
  at-rules (it only filters on display/visibility/opacity at the top
  level, no semantic change).

* The CSSRule spec constants (STYLE_RULE, KEYFRAMES_RULE, MEDIA_RULE,
  ...) were declared as plain Zig consts inside JsApi but never wrapped
  with bridge.property, so they came back as `undefined` from JS. Fixed
  while writing the regression tests since the tests need to compare
  rule.type against the spec constants.

Tests:

* Parser unit tests: cover statement at-rules (`@import url(...);`),
  block at-rules (`@media`), and vendor-prefixed at-rules
  (`@-webkit-keyframes`).

* HTML runner tests: cover insertRule for @keyframes, @media,
  @supports, @font-face, vendor-prefixed at-rules, mixed style + at-rule
  insertion, replaceSync at-rule preservation, and the dedup-via-cssRules
  pattern that's the actual library code path the bug breaks.

A/B verification with the synthetic CSS-in-JS dedup pattern (50 calls to
inject the same @keyframes through a library that falls back to <style>
element injection when insertRule appears empty):

  baseline (1.0.0-nightly.6240): styles=51 rules=0
  patched (this branch):         styles=1  rules=50

The leak collapses: instead of 50 <style> fallbacks stacking up, the
single persistent stylesheet receives all 50 insertions and dedup works.

Note on partial coverage of #2459: the original Allbirds reproducer
involves a Vue.js + Yotpo widget that injects <style> elements via
direct document.head.appendChild rather than insertRule. That code
path is unaffected by this change; it appears to be a separate
mechanism (possibly related to Vue's vue-style-loader closure-based
dedup or web-component lifecycle on Lightpanda) and is worth filing
separately. This PR fixes the specific insertRule contract issue
described in #2459 and unblocks the major CSS-in-JS libraries.
2026-05-14 16:19:36 -04:00
Muki Kiboigo
940976b6a7 properly disable cache on Network.setCacheDisabled 2026-05-14 09:03:51 -07:00
Karl Seguin
a59ddeb360 Dump using the latest Frame to prevent segfault during on frame change
Fixes: https://github.com/lightpanda-io/browser/issues/2446
2026-05-14 20:00:20 +08:00
Karl Seguin
2f3a426fb0 Merge pull request #2453 from lightpanda-io/cdp-network-serve-from-cache
Adds `Network.requestServedFromCache`
2026-05-14 17:34:46 +08:00
Karl Seguin
b96c24d377 Merge pull request #2455 from lightpanda-io/cdp-response-fromdiskcache
Add `fromDiskCache` field to `Network.Response`
2026-05-14 16:01:04 +08:00
Karl Seguin
0624a05205 Merge pull request #2454 from lightpanda-io/cdp-network-cache-clear
add `Network.clearBrowserCache` and `Network.canClearBrowserCache`
2026-05-14 15:56:29 +08:00
Karl Seguin
143bffdfec Merge pull request #2450 from navidemad/fix-bug7-form-idl
forms: add enctype + 5 submitter form-* IDL accessors
2026-05-14 13:44:57 +08:00
Karl Seguin
80a09fc0fd zig fmt 2026-05-14 13:19:17 +08:00
Muki Kiboigo
f2f328cffd add fromDiskCache field to Network.Response CDP type 2026-05-13 21:57:22 -07:00
Muki Kiboigo
07e7c3d687 add Network.clearBrowserCache and Network.canClearBrowserCache 2026-05-13 21:52:10 -07:00
Muki Kiboigo
ac863c7e2b add Network.requestServedFromCache 2026-05-13 21:47:47 -07:00
Karl Seguin
14b4449628 use format to write String value 2026-05-14 11:03:12 +08:00
Karl Seguin
373916873f Merge pull request #2442 from lightpanda-io/worker_message_buffer
CI fixes, callback timing correctness
2026-05-14 08:56:36 +08:00
Karl Seguin
96ac9a49ea Update src/browser/webapi/Worker.zig
Co-authored-by: Navid EMAD <navid.emad@yespark.fr>
2026-05-14 08:33:32 +08:00
Karl Seguin
bcafa175cb make Event worker-safe 2026-05-14 07:11:33 +08:00
Navid EMAD
f0cce42757 forms: route Frame.submitForm through Form.normalizeMethod/normalizeEnctype
The submitForm encoding path was the last duplicate of the "limited to
only known values" canonicalization the previous commit consolidated for
the IDL getters. Now it consumes the same Form.normalizeMethod /
Form.normalizeEnctype helpers, so a single function owns the canonical
mapping (`""` / unknown -> spec default, recognized values pass through
unchanged).

Side effect of routing through the helper: the
`log.warn(.not_implemented, "FormData.encoding", ...)` branch falls out.
After commit 4b693db4 added `text/plain`, the only attribute values that
still reach the urlencoded fallback are spec-invalid ones, which per
HTML §4.10.21.5 silently canonicalize to
`application/x-www-form-urlencoded`. The warning was firing for valid
spec behavior — Chrome doesn't log either.

Behavior-preserving on all observable surfaces: full suite 639/639 green;
existing form-submission integration tests (multipart, urlencoded,
text/plain, GET-ignores-enctype) all pass unchanged.
2026-05-13 18:14:10 +02:00
Navid EMAD
4b693db480 forms: support enctype=text/plain in form submission
Closing the divergence introduced by the new IDL accessors: `submitter.formEnctype`
(and `form.enctype`) now return "text/plain" for that attribute value per WHATWG
HTML §4.10.21.5, but `Frame.submitForm` previously fell back to urlencoded with
a `.not_implemented` log when it saw the same value on the submission path.

Implement the spec's text/plain encoding algorithm (HTML §4.10.21.8):

  - FormData.EncType gains a `.plaintext` variant.
  - FormData.plaintextEncode writes "name=value CRLF" per entry, no URL-encoding,
    no escaping — the spec accepts that text/plain is a lossy, human-readable
    encoding (values containing "=" or CRLF produce an ambiguous wire format
    by design).
  - Frame.submitForm recognizes "text/plain" before the urlencoded fallback and
    sets the Content-Type header to "text/plain; charset=<form-charset>", per
    spec step 21.4.

Two new Zig unit tests cover encoding output (`FormData: plaintext write`,
`FormData: plaintext empty body`). Full suite 639/639 green.

This is bundled with the IDL accessor commits because returning "text/plain"
from the IDL while the submission silently re-encodes as urlencoded is a
spec-internal inconsistency the IDL change itself introduces. Reviewers who'd
prefer to land just the read-only accessors first should feel free to ask for
a split — this commit is self-contained and reverts cleanly.
2026-05-13 18:08:54 +02:00
Navid EMAD
cedfdba0d7 forms: extract normalizeMethod / normalizeEnctype helpers
The "limited to only known values" canonicalization (per WHATWG HTML
§2.2.2) was duplicated five times: Form.getMethod + Form.getEnctype +
{Button,Input}.{getFormMethod,getFormEnctype}. Each callsite differed
only in the missing-value default ("" for submitter overrides, "get" /
"application/x-www-form-urlencoded" for the form-side).

Extract into two pub helpers on Form.zig taking the attribute slice +
the missing-value default. The five callers collapse to one-liners.

Behavior-preserving: existing form.html / button.html / input-attrs.html
fixtures all pass unchanged; full suite 637/637 green.

Net: -36 LOC.
2026-05-13 17:58:55 +02:00
Navid EMAD
2fdc82aa05 forms: add enctype + 5 submitter form-* IDL accessors
Six form-submission IDL accessors were missing from the JsApi blocks of
HTMLFormElement, HTMLButtonElement, and HTMLInputElement, so reads
produced undefined instead of the spec-mandated string/boolean. The
content-attribute path (clicking a submit button honoring formaction /
formmethod / formenctype) was wired up in #2279; this commit adds the
matching IDL-property accessors per WHATWG HTML §4.10.18.6 and §4.10.21.5.

- Form.enctype: limited to known values, missing+invalid both default to
  application/x-www-form-urlencoded (mirrors getMethod's shape).
- Button/Input formAction: returns frame.url when missing/empty, else the
  resolved URL (mirrors Form.getAction).
- Button/Input formEnctype, formMethod: limited to known values with no
  missing-value default ("" when missing, canonical invalid-value default
  application/x-www-form-urlencoded / get when invalid).
- Button/Input formTarget: plain reflection, defaults to "".
- Button/Input formNoValidate: boolean reflection of formnovalidate.

Closes #2449
2026-05-13 17:49:19 +02:00
Karl Seguin
5595f7d298 Merge pull request #2448 from lightpanda-io/script_load_error_handling
Don't process scripts that failed to load
2026-05-13 23:19:40 +08:00
Pierre Tachoire
198c4e5a0f Merge pull request #2444 from lightpanda-io/useless-code
cdp: remove dead code
2026-05-13 15:36:16 +02:00
Pierre Tachoire
ffc2baa733 Merge pull request #2431 from lightpanda-io/cdp-double-frame-navigated-event
fix(cdp): remove duplicate Page.frameNavigated and fix context regist…
2026-05-13 15:17:27 +02:00
Karl Seguin
7750bc94f6 Apply suggestions from code review
Remove no-longer needed setTimeouts in test now that messages are queued. 

Runner also checks ready_queue when determining doneness.

Co-authored-by: Navid EMAD <design.navid@gmail.com>
2026-05-13 20:57:59 +08:00
Karl Seguin
2326071036 Don't [try] to process scripts that failed to load
At some point recently, we started to process scripts that fail to load (e.g.
404). This stops such scripts from [trying] to be evaluated, and executes the
onerror handler in all script loading paths.
2026-05-13 20:48:08 +08:00
Pierre Tachoire
12971a2420 Merge pull request #2445 from lightpanda-io/reset-bc-arena
cdp: reset browser context arena when bc is removed
2026-05-13 14:35:38 +02:00
Pierre Tachoire
5d73d82bf6 cdp: call context created w/ correct is_default_context value
Co-authored-by: Navid EMAD <navid.emad@yespark.fr>
2026-05-13 14:11:53 +02:00
Pierre Tachoire
8432cfbfba cdp: return error in case of missing event's frame
Instead of using the root_frame
2026-05-13 12:29:11 +02:00
Karl Seguin
e895ce81e3 Merge pull request #2437 from lightpanda-io/window_frameElement
Add window.frameElement
2026-05-13 18:00:08 +08:00
Karl Seguin
3e31fde66c Merge pull request #2443 from lightpanda-io/url_fixes
Fix URLSearchParams constructor
2026-05-13 17:59:50 +08:00
Karl Seguin
625e240f5a Pump the http_client queue after perform, not just before
Client.tick drains self.queue (assigning conns to queued transfers) only
at the start. When perform / processMessages releases a batch of conns
back to the pool, those conns sit idle until the next tick — a queued
transfer that could have run this tick waits one Runner iteration
(~20 ms in the test runner) for no reason. Adds a second drainQueue
call after perform so newly-freed conns get picked up immediately.

In practice this matters whenever httpMaxHostOpen / httpMaxConcurrent
is exceeded — pages with N > limit subresources had each "wave" of
queue overflow paying one extra tick of latency.
2026-05-13 17:58:49 +08:00
Karl Seguin
c79dd2bf1f Make runner aware of http_client.queue
When connections are queued, the processing cannot be considered done.
2026-05-13 17:55:39 +08:00
Karl Seguin
afc0942655 Merge pull request #2441 from lightpanda-io/fix-robots-crash
Fix crash on `robots.txt` being fulfilled synchronously
2026-05-13 17:39:22 +08:00
Pierre Tachoire
36b55339cd cdp: reset browser context arena when bc is removed 2026-05-13 11:26:09 +02:00
Pierre Tachoire
403fe0d293 cdp: remove dead code 2026-05-13 11:18:05 +02:00
Karl Seguin
c860a9a9e5 Split xhr-in-worker tests into their own file
xhr.html can brush up against the timeout as we add more and more cases. This
is particularly true on the slow CI, in debug builds, with TSAN.
2026-05-13 15:59:29 +08:00
Karl Seguin
dd99102f4b Defer HTTP completion callbacks to next tick
Client.makeRequest used to call self.perform(0) after handing the transfer
to libcurl. That perform() does two things: drives curl_multi_perform (so
bytes hit the wire) AND drains curl_multi_info_read messages, which is
what fires the user-facing header/data/done callbacks.

The issue is that, even in non-cache cases, a request could be immediately
resolved in libcurl, and thus callbacks executed synchronously.

By only calling `curl_multi_perform` on a new request, we prevent this from
happening.
2026-05-13 15:59:29 +08:00
Karl Seguin
2fcad23834 Buffer worker postMessages received before script load completes 2026-05-13 15:59:29 +08:00
Karl Seguin
6d58af350d Flag functions and accessors as DontEnum by default
Only `own_properties`, e.g. window.CSS should be enumerable.
2026-05-13 15:49:31 +08:00
Karl Seguin
cc4ad53661 Fix URLSearchParams constructor
First, KeyValueList.fromJsObject now only iterates own properties. Second
URLSearchParams can now be constructed with another URLSearchParams. This is
a stopgap. The correct solution is for it to accept any iterator, but as a
quick fix for known cases (airbnb.com), this will help.
2026-05-13 14:38:43 +08:00
Pierre Tachoire
854eb6a62d Merge pull request #2339 from lightpanda-io/cdp-console
cdp: implement Console
2026-05-13 08:28:01 +02:00
Muki Kiboigo
4a45b4d866 fix crash on robots.txt request fufilled immediately 2026-05-12 21:50:05 -07:00
Karl Seguin
bd4f4c89e1 Merge pull request #2440 from staylor/scott/fix-worker-context-exit-with-proxy
Add LP.configureLoading worker + --disable-workers opt-out for Web Worker loading
2026-05-13 12:29:43 +08:00
Karl Seguin
10a5597aba Merge pull request #2435 from navidemad/fix-b12-htmldialogelement-methods
dom: implement HTMLDialogElement.{show, showModal, close}
2026-05-13 12:17:20 +08:00
Karl Seguin
393141e472 pass arena into handlers (consistent with other handlers) 2026-05-13 11:51:59 +08:00
Scott Taylor
b2998470c2 Add --disable-workers + LP.configureLoading worker opt-out
Adds two ways to opt out of dedicated Web Worker loading entirely. The
Worker constructor still returns a Worker object so calling pages don't
throw, but no script fetch is initiated and the worker scope's eval
never runs (postMessage from the page is queued indefinitely with no
handler to drain it).

* CDP method LP.configureLoading { worker: bool } -- per-session
  toggleable at runtime, alongside the existing { subFrame: bool }.
  Both fields are now optional so callers can flip one without
  resetting the other to its default. Backwards-compatible.
* CLI flag --disable-workers -- process-wide default applying to every
  session and to the fetch subcommand. Operators can flip it on without
  any driver changes. Mirrors --disable-subframes (#2401) exactly.

## Motivation

Reliably-reproducible SIGABRT in Worker.loadInitialScript whenever a
page constructs a Web Worker AND lightpanda is launched with
--http_proxy. Crash signature:

    $msg="V8 fatal callback" location=v8::Context::Exit()
    message="Cannot exit non-entered context"
    Stack:
      _browser.webapi.Worker.loadInitialScript
      _browser.webapi.Worker.httpDoneCallback
      _network.layer.InterceptionLayer.InterceptContext.doneCallback
      _browser.HttpClient.processMessages
      _browser.HttpClient.perform
      _browser.HttpClient.tick

The Zig-side Enter/Exit pair around the worker's eval doesn't match
v8's entered_contexts stack invariant under that timing -- something
upstream of the loadInitialScript Exit leaves an extra Enter on the
stack, so v8's Utils::ApiCheck (`isolate->context() == *env`) fires
and the process aborts.

Reproducible against any Shopify storefront PDP (e.g.
https://weareallbirds.myshopify.com/products/mens-wool-runners) when
served through any HTTP proxy -- the proxy just adds enough latency
to surface the race; the same code path runs without --http_proxy
but the timing window is too tight to reliably hit. The Allbirds
trigger script is the Shopify web-pixel-extension worker, but ANY
Worker the page constructs hits the same code path.

The proper fix needs the v8 entered-contexts invariant to be
restored end-to-end through the worker eval. That's a deeper dig
into how Worker.loadInitialScript / WorkerGlobalScope.importScript /
ls.local.runMacrotasks compose with v8's microtask queues across
multiple contexts; I tried three intermediate fixes (deferring
loadInitialScript via the frame scheduler when other scripts are
mid-eval, replacing the post-eval cross-context runMacrotasks with
worker-only PerformCheckpoint, and removing runMacrotasks entirely)
and none stopped the crash. The bug is fired from inside the
synchronous tick path before the post-eval microtask handling
runs, which means the leak happens during Script::Run itself and
needs more targeted investigation.

This PR is the workaround so users hitting the SIGABRT on
storefront / analytics-heavy pages have a clean opt-in escape today.
For our use case (product catalog extraction) Workers carry no
extraction signal -- web-pixel sandboxes, analytics SDKs, marketing
tag pixels, etc. -- so disabling them removes a fragile code path
without any downside.

## Implementation

`Session.worker_loading_enabled: bool = true` -- default matches
existing behavior.

`Worker.init` short-circuits AFTER constructing the Worker /
WorkerGlobalScope / arena bookkeeping (so the JS `new Worker(url)`
expression doesn't throw):

    if (!session.worker_loading_enabled) {
        log.debug(.browser, "worker disabled", .{ .url = resolved_url });
        return self;
    }

Two ways to flip the flag, mirroring the --disable-subframes pattern:

1. LP.configureLoading { worker: bool } -- both subFrame and worker
   are now optional fields in the params struct, so existing callers
   passing only { subFrame } continue to work unchanged.
2. --disable-workers CLI flag -- added to CommonOptions (so it
   applies to serve, fetch, mcp). New Config.disableWorkers() getter;
   Session.init reads it as the initial value.

Total diff: +88 / -3 across 4 files (src/Config.zig,
src/browser/Session.zig, src/browser/webapi/Worker.zig,
src/cdp/domains/lp.zig).

## Verification

Reproducer pattern (puppeteer-core 24.42.0 + tiny CONNECT-tunnel
proxy on 127.0.0.1:9999, scripts in cdp-repros/):

  serve --host 127.0.0.1 --port 9222 --http_proxy http://127.0.0.1:9999
  serve --host 127.0.0.1 --port 9222 --http_proxy http://127.0.0.1:9999 --disable-workers

Driving https://weareallbirds.myshopify.com/products/mens-wool-runners:

  baseline (no --disable-workers): 5/5 SIGABRT in
    Worker.loadInitialScript with the v8 fatal callback above.

  with --disable-workers:           10/10 successful, returns full
    HTML (~1MB), no crash.

Test suite:

  make test  -> 637 of 637 tests passed (was 636/636 + new
    cdp.lp: configureLoading toggles subFrame and worker
    independently regression test).

  zig fmt --check ./*.zig ./**/*.zig  -> clean.

## Notes

* The CDP method is the same domain (LP.configureLoading) and same
  shape as --disable-subframes' driver-side opt-in, so existing
  Playwright / puppeteer integrations that already toggle
  subframes don't need a separate code path -- one CDP call can
  flip both.

* worker_loading_enabled = false does NOT remove Worker from the
  global namespace (so feature-detection like
  `if (typeof Worker !== 'undefined')` still reports true). It just
  makes constructed workers no-op. Pages that postMessage to a worker
  and wait for a response will hang on that promise forever (or
  until the page is torn down). For our extraction use case that's
  fine -- we control the worklist timeout anyway -- but it's worth
  noting if upstream wants to surface the disabled state more
  strongly (e.g. throw from postMessage, or remove the global
  entirely behind an even-stricter flag).

* Once the underlying v8 entered-contexts invariant is restored in
  Worker.loadInitialScript, this flag becomes a perf / sandboxing
  tool rather than a correctness workaround. Worth keeping anyway:
  blocking analytics / pixel workers is a reasonable thing to want.

## Related

* #2400 -- the iframe analog to this issue (subframe nav invalidates
  executionContextId); same workaround pattern.
* #2401 -- introduced --disable-subframes / LP.configureLoading
  { subFrame } that this PR mirrors exactly for workers.
2026-05-12 23:46:45 -04:00