* Prefer `--inject-*` prefix.
* Support injecting multiple scripts (also allows using both variants together).
* Instead of executing scripts in JS context, actually insert them to `<head>` for correct dump output.
Worker scripts can call importScripts(), which performs a synchronous
HTTP request via HttpClient.syncRequest. To stay responsive during a
long fetch, syncRequest pumps the CDP socket (cdp.blocking_read) while
waiting. If a CDP message such as Target.closeTarget arrives on that
socket mid-fetch, the previous code path tore down the page
immediately:
Worker JS -> importScripts -> syncRequest -> blocking_read
-> CDP dispatch -> Target.closeTarget
-> Session.removePage -> Page.deinit -> Frame.deinit
-> Worker.deinit (frees worker arena + identity_map)
When control unwound back into the worker's eval, the next operation
that hit ctx.identity.identity_map.getOrPut dereferenced the freed
metadata pointer and segfaulted (sometimes immediately, sometimes a
few connections later as the arena got recycled).
Reproducer: any URL that loads dedicated workers calling importScripts
during initial eval, driven via puppeteer-core's connectOverCDP. The
allbirds.com product page (which loads ~8 web-pixel workers each
calling importScripts) reliably triggered it within ~10 connections.
Session.removePage already deferred when the frame's own
ScriptManager.is_evaluating was set; that guard never tripped because
worker scripts don't go through the frame's ScriptManager. Fix:
* Worker.loadInitialScript now sets the worker's own
_worker_scope._script_manager.is_evaluating around the eval, with
save/restore so nested worker evals compose correctly.
* WorkerGlobalScope.importScript also sets its own
_script_manager.is_evaluating around the syncRequest +
runMacrotasks. The typical caller (Worker.loadInitialScript)
already sets this around its outer eval, so the outer guard
usually covers us; the inner mark is defense-in-depth for callers
that reach importScripts() from a setTimeout / microtask outside
the loadInitialScript scope.
* New Frame.anyScriptEvaluating method walks the frame tree (frame
ScriptManager + every worker's ScriptManager + child frames) and
returns true if any is mid-eval. Session.removePage and
CDP.disposeBrowserContext use this in place of the frame-only
check, deferring teardown until all evals unwind. Final cleanup
happens at CDP.deinit on connection close, matching the existing
deferred-teardown contract.
Verified by running the puppeteer-core repro back-to-back against a
single Lightpanda serve; all returned 200 with the right title, no
UAF crashes (was previously crashing within 1-10 runs). All 521 unit
tests still pass.
Note: a separate, pre-existing latent V8 issue surfaces under stress
on this same code path. After many iterations a Runtime.evaluate
promise tracked by V8's inspector PromiseHandlerTracker is discarded
during garbage collection's first-pass weak callbacks; the discard
sends a failure response which triggers v8::String::NewFromOneByte,
hitting the debug-only assertion AllowHeapAllocation::IsAllowed() in
heap-allocator-inl.h:79 (no allocations allowed during weak callbacks).
This reproduces on a baseline build of this PR commit and on a
baseline build of just the original two-line is_evaluating fix \u2014
i.e. it is not introduced by the deferral logic. The deferral makes
it more visible because inspector callbacks now live longer before
teardown, so they are more likely to be alive during a GC. Tracking
this as a follow-up; the fix here still resolves the UAF that was
crashing the server immediately.
When http_client.request fails synchronously (e.g. RobotsLayer returning
RobotsBlocked because robots.txt is already cached), Client.request
invokes our error_callback before returning the error. httpErrorCallback
rejects the promise and releases response._arena. Letting the error
propagate from Fetch.init also fires the `errdefer response.deinit`,
double-freeing the arena and corrupting the arena pool — eventually
surfacing as a malloc abort during teardown.
Fixes#2403.
Strip mentions of the private gem and its internal paths from xpath
module docstrings, the conformance test header, and the dom dispatch
heuristic. Comments now describe behavior directly without pointing at
sources public readers can't access.
The previous `::` heuristic accepted any identifier-like character before
`::`, which misrouted CSS pseudo-elements (`a::before`, `div::after`) to
the XPath evaluator. Walk back the run of [a-zA-Z-] characters and look
the candidate up in a StaticStringMap of the 13 XPath 1.0 named axes,
so only real axis names match.
Generalizes 8733e33b's //tag[@id='x'] shape: tryFusedDescendantFastPath
handles any //tag[safe] or .//tag[safe] where the predicates are
non-positional boolean/node-set checks. Walks the search root's
descendants once in document order, applies node test + predicates
inline, no per-step materialization, no dedup.
5-9x on //div, //*, //*[@class='x'], //div[contains(...)]; ~25x on
(//div)[1] and count(//div) where the inner path is the shape.
Safety gate rejects predicates that could produce a number at the
top level (number, neg, arithmetic binop, numeric-returning fn-call)
and any predicate containing position()/last() anywhere. Conservative:
a nested sub-path's local positional predicate is rejected even though
it's scoped to its own axis.
evalPath recognizes //tag[@id='x'] and .//tag[@id='x'] (plus the
//*[@id='x'] wildcard) and serves them via frame.getElementByIdFromNode.
~100-150x speedup on ID lookups (3231us -> 22.6us for //*[@id='target']
in the new benchmark). Falls through to general path on any deviation
(extra step, extra predicate, non-eq, non-literal RHS).
Inherits the same duplicate-ID compromise selector/List.zig ships for
querySelector(All): the id-map stores only the first element per ID in
document order. Capybara/Selenium hot paths assume unique IDs.
tests/xpath/xpath_perf.html is the 13-query micro-benchmark used to
collect the numbers; batched console.warn output survives test runner
interleaving.
- Document.evaluate / XPathEvaluator.evaluate / XPathExpression.evaluate:
result_type / requested_type now optional u16 defaulting to ANY_TYPE
(matches WHATWG: `optional unsigned short type = 0`). context_node
stays nullable with a fallback to the document — preserves the
polyfill's behavior asserted by the `default_context` fixture
- ast.zig NodeTest: clarify that namespaced names (`prefix:*`,
`prefix:local`) are stored verbatim and fall through to a literal
match against the node name — consistent with the `namespace::` axis
stub (decision #3). Adds a TODO for if the polyfill ever drops the
stub
- Parser: cap recursive descent at depth 64 with new
error.MaxDepthExceeded; depth tracked across parseExpr (parens,
predicates, function args) and parseUnaryExpr (chained `-`). Two
regression tests cover deep parenthesization and deep unary minus
Per XPath 1.0 §5.7, the data model has no CDATASection node — CDATA
content is part of the text node value. The text() node test was only
matching DOM nodeType 3 (Text), silently excluding CDATA sections
(nodeType 4) parsed via DOMParser/XMLDocument and inline foreign
content like SVG with embedded scripts.
The attribute axis was calling Entry.toAttribute on every visit,
materializing fresh *Attribute structs (plus duped name/value strings)
into page-lifetime storage. Repeated XPath queries — the Capybara/
Selenium polling pattern this PR targets — accumulated unbounded
copies for the same DOM entries. Route through frame._attribute_lookup
so each Entry resolves to a single cached *Attribute, matching
List.getAttribute and NamedNodeMap.getAtIndex.
A bare indexOf("::") matched CSS pseudo-elements (a::before) and
attribute values containing '::' ([data-x="x::y"]), misrouting them
to the XPath evaluator. Require an axis-name shape ([a-zA-Z-])
immediately before '::' so only real axis specifiers like
descendant::p are dispatched to XPath.
The Parser borrows string slices from its input for AST literals,
names, and var refs. Without duping, the AST holds slices into the JS
call_arena, which is reset when the top-level call returns — every
subsequent evaluate() of a cached XPathExpression would dereference
freed memory.
Ports the capybara-lightpanda XPath 1.0 polyfill into Lightpanda.
Exposes the WHATWG Document.evaluate / XPathResult / XPathEvaluator
/ XPathExpression surface and routes CDP DOM.performSearch XPath
queries through the new evaluator. The capybara-lightpanda gem can
drop its ~700-line JS polyfill in the next release.
New module src/browser/xpath/ (Tokenizer, Parser, Ast, Evaluator,
Functions, Result). New webapi types XPathResult,
XPathExpression, XPathEvaluator. Coverage and stubs match the
polyfill 1:1 — see capybara-lightpanda/XPATH_COMPLIANCE.md for
the full spec.
Tests: 91-case conformance + result-API + evaluator-API + CDP
fixtures, plus the engine's Zig unit suite (601/601 pass).
Give scheduler a 500ms timeslice to run per queue (high/low priority).
A site can load hundreds of timeouts to all execute at the same time. These
can be relatively expensive (e.g. lots of calls directly or indirectly to
getBoundingClientRect). As-is, the scheduler drains its queue to completion and
other timeouts, like --wait-ms can't do what they're meant to do. By adding
timeslice, we prevent many tasks all scheduled for the same time to go
unchecked.
I was initially planning on putting this higher in runMacrotasks, but that could
lead to starvation, i.e. if the first context used up all the time. Having it
per context is more fair, at the cost of running 500ms * context. But, (a) the
number of context we allow is fixed and (b) the reality is that most sites have
few contexts and normally only the first one is doing anything interesting.