Implements the page-side surface of the W3C WebMCP spec
(https://webmachinelearning.github.io/webmcp/): exposes
`navigator.modelContext.registerTool(...)` for declaring MCP tools to a
browser agent, with full name/description validation, AbortSignal-based
unregistration, and a `ModelContextClient` whose `requestUserInteraction`
invokes its callback directly (closest faithful behavior in a headless
browser).
CSSStyleSheet.insertRule previously detected at-rules in the parser
(has_skipped_at_rule) and returned the requested index without inserting
anything. The original change (PR #1972) did this to keep at-rule input
from killing module evaluation in apps like Expo Web -- correct, but the
silent-success contract has a second-order effect: CSS-in-JS libraries
(emotion, styled-components, Stitches, Mantine, Linaria) that round-trip
through cssRules to deduplicate their stylesheets see the rule as missing
after insertion, conclude the stylesheet is empty, and fall back to direct
<style> element injection per render. The result is unbounded <style>
accumulation on long-lived sessions doing repeated DOM interaction. See
issue #2459 for measurements (Allbirds-style PDPs accumulating thousands
of <style> elements over a render loop).
Changes:
* Parser.RulesIterator now returns a Rule union of {.style, .at_rule}
instead of skipping at-rules and setting has_skipped_at_rule. The
at-rule variant carries the keyword (without `@`) and the full source
span so callers can construct an opaque placeholder rule.
* CSSRule gains a `_text` field and an `initAtRule` constructor for
storing the at-rule source. CSSRule.getCssText returns the stored
text (CSSStyleRule's overridden getCssText still wins for `.style`
rules via the bridge dispatch on the most-derived class).
* CSSStyleSheet.insertRule and replaceSync handle both Rule variants:
regular rules become CSSStyleRule as before, at-rules become opaque
CSSRule placeholders with the matching CSSRule.Type variant
(vendor-prefixed keyframes are normalized; unrecognized at-rule
keywords fall back to .media). The CSS engine still doesn't apply
these rules -- that's intentional and outside the scope of this
change -- but they now surface via cssRules so library dedup paths
work correctly.
* StyleManager.addRawRules switches on the Rule kind and skips
at-rules (it only filters on display/visibility/opacity at the top
level, no semantic change).
* The CSSRule spec constants (STYLE_RULE, KEYFRAMES_RULE, MEDIA_RULE,
...) were declared as plain Zig consts inside JsApi but never wrapped
with bridge.property, so they came back as `undefined` from JS. Fixed
while writing the regression tests since the tests need to compare
rule.type against the spec constants.
Tests:
* Parser unit tests: cover statement at-rules (`@import url(...);`),
block at-rules (`@media`), and vendor-prefixed at-rules
(`@-webkit-keyframes`).
* HTML runner tests: cover insertRule for @keyframes, @media,
@supports, @font-face, vendor-prefixed at-rules, mixed style + at-rule
insertion, replaceSync at-rule preservation, and the dedup-via-cssRules
pattern that's the actual library code path the bug breaks.
A/B verification with the synthetic CSS-in-JS dedup pattern (50 calls to
inject the same @keyframes through a library that falls back to <style>
element injection when insertRule appears empty):
baseline (1.0.0-nightly.6240): styles=51 rules=0
patched (this branch): styles=1 rules=50
The leak collapses: instead of 50 <style> fallbacks stacking up, the
single persistent stylesheet receives all 50 insertions and dedup works.
Note on partial coverage of #2459: the original Allbirds reproducer
involves a Vue.js + Yotpo widget that injects <style> elements via
direct document.head.appendChild rather than insertRule. That code
path is unaffected by this change; it appears to be a separate
mechanism (possibly related to Vue's vue-style-loader closure-based
dedup or web-component lifecycle on Lightpanda) and is worth filing
separately. This PR fixes the specific insertRule contract issue
described in #2459 and unblocks the major CSS-in-JS libraries.
The submitForm encoding path was the last duplicate of the "limited to
only known values" canonicalization the previous commit consolidated for
the IDL getters. Now it consumes the same Form.normalizeMethod /
Form.normalizeEnctype helpers, so a single function owns the canonical
mapping (`""` / unknown -> spec default, recognized values pass through
unchanged).
Side effect of routing through the helper: the
`log.warn(.not_implemented, "FormData.encoding", ...)` branch falls out.
After commit 4b693db4 added `text/plain`, the only attribute values that
still reach the urlencoded fallback are spec-invalid ones, which per
HTML §4.10.21.5 silently canonicalize to
`application/x-www-form-urlencoded`. The warning was firing for valid
spec behavior — Chrome doesn't log either.
Behavior-preserving on all observable surfaces: full suite 639/639 green;
existing form-submission integration tests (multipart, urlencoded,
text/plain, GET-ignores-enctype) all pass unchanged.
Closing the divergence introduced by the new IDL accessors: `submitter.formEnctype`
(and `form.enctype`) now return "text/plain" for that attribute value per WHATWG
HTML §4.10.21.5, but `Frame.submitForm` previously fell back to urlencoded with
a `.not_implemented` log when it saw the same value on the submission path.
Implement the spec's text/plain encoding algorithm (HTML §4.10.21.8):
- FormData.EncType gains a `.plaintext` variant.
- FormData.plaintextEncode writes "name=value CRLF" per entry, no URL-encoding,
no escaping — the spec accepts that text/plain is a lossy, human-readable
encoding (values containing "=" or CRLF produce an ambiguous wire format
by design).
- Frame.submitForm recognizes "text/plain" before the urlencoded fallback and
sets the Content-Type header to "text/plain; charset=<form-charset>", per
spec step 21.4.
Two new Zig unit tests cover encoding output (`FormData: plaintext write`,
`FormData: plaintext empty body`). Full suite 639/639 green.
This is bundled with the IDL accessor commits because returning "text/plain"
from the IDL while the submission silently re-encodes as urlencoded is a
spec-internal inconsistency the IDL change itself introduces. Reviewers who'd
prefer to land just the read-only accessors first should feel free to ask for
a split — this commit is self-contained and reverts cleanly.
The "limited to only known values" canonicalization (per WHATWG HTML
§2.2.2) was duplicated five times: Form.getMethod + Form.getEnctype +
{Button,Input}.{getFormMethod,getFormEnctype}. Each callsite differed
only in the missing-value default ("" for submitter overrides, "get" /
"application/x-www-form-urlencoded" for the form-side).
Extract into two pub helpers on Form.zig taking the attribute slice +
the missing-value default. The five callers collapse to one-liners.
Behavior-preserving: existing form.html / button.html / input-attrs.html
fixtures all pass unchanged; full suite 637/637 green.
Net: -36 LOC.
Six form-submission IDL accessors were missing from the JsApi blocks of
HTMLFormElement, HTMLButtonElement, and HTMLInputElement, so reads
produced undefined instead of the spec-mandated string/boolean. The
content-attribute path (clicking a submit button honoring formaction /
formmethod / formenctype) was wired up in #2279; this commit adds the
matching IDL-property accessors per WHATWG HTML §4.10.18.6 and §4.10.21.5.
- Form.enctype: limited to known values, missing+invalid both default to
application/x-www-form-urlencoded (mirrors getMethod's shape).
- Button/Input formAction: returns frame.url when missing/empty, else the
resolved URL (mirrors Form.getAction).
- Button/Input formEnctype, formMethod: limited to known values with no
missing-value default ("" when missing, canonical invalid-value default
application/x-www-form-urlencoded / get when invalid).
- Button/Input formTarget: plain reflection, defaults to "".
- Button/Input formNoValidate: boolean reflection of formnovalidate.
Closes#2449
Remove no-longer needed setTimeouts in test now that messages are queued.
Runner also checks ready_queue when determining doneness.
Co-authored-by: Navid EMAD <design.navid@gmail.com>
At some point recently, we started to process scripts that fail to load (e.g.
404). This stops such scripts from [trying] to be evaluated, and executes the
onerror handler in all script loading paths.
Client.tick drains self.queue (assigning conns to queued transfers) only
at the start. When perform / processMessages releases a batch of conns
back to the pool, those conns sit idle until the next tick — a queued
transfer that could have run this tick waits one Runner iteration
(~20 ms in the test runner) for no reason. Adds a second drainQueue
call after perform so newly-freed conns get picked up immediately.
In practice this matters whenever httpMaxHostOpen / httpMaxConcurrent
is exceeded — pages with N > limit subresources had each "wave" of
queue overflow paying one extra tick of latency.
Client.makeRequest used to call self.perform(0) after handing the transfer
to libcurl. That perform() does two things: drives curl_multi_perform (so
bytes hit the wire) AND drains curl_multi_info_read messages, which is
what fires the user-facing header/data/done callbacks.
The issue is that, even in non-cache cases, a request could be immediately
resolved in libcurl, and thus callbacks executed synchronously.
By only calling `curl_multi_perform` on a new request, we prevent this from
happening.
First, KeyValueList.fromJsObject now only iterates own properties. Second
URLSearchParams can now be constructed with another URLSearchParams. This is
a stopgap. The correct solution is for it to accept any iterator, but as a
quick fix for known cases (airbnb.com), this will help.
Adds two ways to opt out of dedicated Web Worker loading entirely. The
Worker constructor still returns a Worker object so calling pages don't
throw, but no script fetch is initiated and the worker scope's eval
never runs (postMessage from the page is queued indefinitely with no
handler to drain it).
* CDP method LP.configureLoading { worker: bool } -- per-session
toggleable at runtime, alongside the existing { subFrame: bool }.
Both fields are now optional so callers can flip one without
resetting the other to its default. Backwards-compatible.
* CLI flag --disable-workers -- process-wide default applying to every
session and to the fetch subcommand. Operators can flip it on without
any driver changes. Mirrors --disable-subframes (#2401) exactly.
## Motivation
Reliably-reproducible SIGABRT in Worker.loadInitialScript whenever a
page constructs a Web Worker AND lightpanda is launched with
--http_proxy. Crash signature:
$msg="V8 fatal callback" location=v8::Context::Exit()
message="Cannot exit non-entered context"
Stack:
_browser.webapi.Worker.loadInitialScript
_browser.webapi.Worker.httpDoneCallback
_network.layer.InterceptionLayer.InterceptContext.doneCallback
_browser.HttpClient.processMessages
_browser.HttpClient.perform
_browser.HttpClient.tick
The Zig-side Enter/Exit pair around the worker's eval doesn't match
v8's entered_contexts stack invariant under that timing -- something
upstream of the loadInitialScript Exit leaves an extra Enter on the
stack, so v8's Utils::ApiCheck (`isolate->context() == *env`) fires
and the process aborts.
Reproducible against any Shopify storefront PDP (e.g.
https://weareallbirds.myshopify.com/products/mens-wool-runners) when
served through any HTTP proxy -- the proxy just adds enough latency
to surface the race; the same code path runs without --http_proxy
but the timing window is too tight to reliably hit. The Allbirds
trigger script is the Shopify web-pixel-extension worker, but ANY
Worker the page constructs hits the same code path.
The proper fix needs the v8 entered-contexts invariant to be
restored end-to-end through the worker eval. That's a deeper dig
into how Worker.loadInitialScript / WorkerGlobalScope.importScript /
ls.local.runMacrotasks compose with v8's microtask queues across
multiple contexts; I tried three intermediate fixes (deferring
loadInitialScript via the frame scheduler when other scripts are
mid-eval, replacing the post-eval cross-context runMacrotasks with
worker-only PerformCheckpoint, and removing runMacrotasks entirely)
and none stopped the crash. The bug is fired from inside the
synchronous tick path before the post-eval microtask handling
runs, which means the leak happens during Script::Run itself and
needs more targeted investigation.
This PR is the workaround so users hitting the SIGABRT on
storefront / analytics-heavy pages have a clean opt-in escape today.
For our use case (product catalog extraction) Workers carry no
extraction signal -- web-pixel sandboxes, analytics SDKs, marketing
tag pixels, etc. -- so disabling them removes a fragile code path
without any downside.
## Implementation
`Session.worker_loading_enabled: bool = true` -- default matches
existing behavior.
`Worker.init` short-circuits AFTER constructing the Worker /
WorkerGlobalScope / arena bookkeeping (so the JS `new Worker(url)`
expression doesn't throw):
if (!session.worker_loading_enabled) {
log.debug(.browser, "worker disabled", .{ .url = resolved_url });
return self;
}
Two ways to flip the flag, mirroring the --disable-subframes pattern:
1. LP.configureLoading { worker: bool } -- both subFrame and worker
are now optional fields in the params struct, so existing callers
passing only { subFrame } continue to work unchanged.
2. --disable-workers CLI flag -- added to CommonOptions (so it
applies to serve, fetch, mcp). New Config.disableWorkers() getter;
Session.init reads it as the initial value.
Total diff: +88 / -3 across 4 files (src/Config.zig,
src/browser/Session.zig, src/browser/webapi/Worker.zig,
src/cdp/domains/lp.zig).
## Verification
Reproducer pattern (puppeteer-core 24.42.0 + tiny CONNECT-tunnel
proxy on 127.0.0.1:9999, scripts in cdp-repros/):
serve --host 127.0.0.1 --port 9222 --http_proxy http://127.0.0.1:9999
serve --host 127.0.0.1 --port 9222 --http_proxy http://127.0.0.1:9999 --disable-workers
Driving https://weareallbirds.myshopify.com/products/mens-wool-runners:
baseline (no --disable-workers): 5/5 SIGABRT in
Worker.loadInitialScript with the v8 fatal callback above.
with --disable-workers: 10/10 successful, returns full
HTML (~1MB), no crash.
Test suite:
make test -> 637 of 637 tests passed (was 636/636 + new
cdp.lp: configureLoading toggles subFrame and worker
independently regression test).
zig fmt --check ./*.zig ./**/*.zig -> clean.
## Notes
* The CDP method is the same domain (LP.configureLoading) and same
shape as --disable-subframes' driver-side opt-in, so existing
Playwright / puppeteer integrations that already toggle
subframes don't need a separate code path -- one CDP call can
flip both.
* worker_loading_enabled = false does NOT remove Worker from the
global namespace (so feature-detection like
`if (typeof Worker !== 'undefined')` still reports true). It just
makes constructed workers no-op. Pages that postMessage to a worker
and wait for a response will hang on that promise forever (or
until the page is torn down). For our extraction use case that's
fine -- we control the worklist timeout anyway -- but it's worth
noting if upstream wants to surface the disabled state more
strongly (e.g. throw from postMessage, or remove the global
entirely behind an even-stricter flag).
* Once the underlying v8 entered-contexts invariant is restored in
Worker.loadInitialScript, this flag becomes a perf / sandboxing
tool rather than a correctness workaround. Worth keeping anyway:
blocking analytics / pixel workers is a reasonable thing to want.
## Related
* #2400 -- the iframe analog to this issue (subframe nav invalidates
executionContextId); same workaround pattern.
* #2401 -- introduced --disable-subframes / LP.configureLoading
{ subFrame } that this PR mirrors exactly for workers.