Worker scripts can call importScripts(), which performs a synchronous
HTTP request via HttpClient.syncRequest. To stay responsive during a
long fetch, syncRequest pumps the CDP socket (cdp.blocking_read) while
waiting. If a CDP message such as Target.closeTarget arrives on that
socket mid-fetch, the previous code path tore down the page
immediately:
Worker JS -> importScripts -> syncRequest -> blocking_read
-> CDP dispatch -> Target.closeTarget
-> Session.removePage -> Page.deinit -> Frame.deinit
-> Worker.deinit (frees worker arena + identity_map)
When control unwound back into the worker's eval, the next operation
that hit ctx.identity.identity_map.getOrPut dereferenced the freed
metadata pointer and segfaulted (sometimes immediately, sometimes a
few connections later as the arena got recycled).
Reproducer: any URL that loads dedicated workers calling importScripts
during initial eval, driven via puppeteer-core's connectOverCDP. The
allbirds.com product page (which loads ~8 web-pixel workers each
calling importScripts) reliably triggered it within ~10 connections.
Session.removePage already deferred when the frame's own
ScriptManager.is_evaluating was set; that guard never tripped because
worker scripts don't go through the frame's ScriptManager. Fix:
* Worker.loadInitialScript now sets the worker's own
_worker_scope._script_manager.is_evaluating around the eval, with
save/restore so nested worker evals compose correctly.
* WorkerGlobalScope.importScript also sets its own
_script_manager.is_evaluating around the syncRequest +
runMacrotasks. The typical caller (Worker.loadInitialScript)
already sets this around its outer eval, so the outer guard
usually covers us; the inner mark is defense-in-depth for callers
that reach importScripts() from a setTimeout / microtask outside
the loadInitialScript scope.
* New Frame.anyScriptEvaluating method walks the frame tree (frame
ScriptManager + every worker's ScriptManager + child frames) and
returns true if any is mid-eval. Session.removePage and
CDP.disposeBrowserContext use this in place of the frame-only
check, deferring teardown until all evals unwind. Final cleanup
happens at CDP.deinit on connection close, matching the existing
deferred-teardown contract.
Verified by running the puppeteer-core repro back-to-back against a
single Lightpanda serve; all returned 200 with the right title, no
UAF crashes (was previously crashing within 1-10 runs). All 521 unit
tests still pass.
Note: a separate, pre-existing latent V8 issue surfaces under stress
on this same code path. After many iterations a Runtime.evaluate
promise tracked by V8's inspector PromiseHandlerTracker is discarded
during garbage collection's first-pass weak callbacks; the discard
sends a failure response which triggers v8::String::NewFromOneByte,
hitting the debug-only assertion AllowHeapAllocation::IsAllowed() in
heap-allocator-inl.h:79 (no allocations allowed during weak callbacks).
This reproduces on a baseline build of this PR commit and on a
baseline build of just the original two-line is_evaluating fix \u2014
i.e. it is not introduced by the deferral logic. The deferral makes
it more visible because inspector callbacks now live longer before
teardown, so they are more likely to be alive during a GC. Tracking
this as a follow-up; the fix here still resolves the UAF that was
crashing the server immediately.
When http_client.request fails synchronously (e.g. RobotsLayer returning
RobotsBlocked because robots.txt is already cached), Client.request
invokes our error_callback before returning the error. httpErrorCallback
rejects the promise and releases response._arena. Letting the error
propagate from Fetch.init also fires the `errdefer response.deinit`,
double-freeing the arena and corrupting the arena pool — eventually
surfacing as a malloc abort during teardown.
Fixes#2403.
Give scheduler a 500ms timeslice to run per queue (high/low priority).
A site can load hundreds of timeouts to all execute at the same time. These
can be relatively expensive (e.g. lots of calls directly or indirectly to
getBoundingClientRect). As-is, the scheduler drains its queue to completion and
other timeouts, like --wait-ms can't do what they're meant to do. By adding
timeslice, we prevent many tasks all scheduled for the same time to go
unchecked.
I was initially planning on putting this higher in runMacrotasks, but that could
lead to starvation, i.e. if the first context used up all the time. Having it
per context is more fair, at the cost of running 500ms * context. But, (a) the
number of context we allow is fixed and (b) the reality is that most sites have
few contexts and normally only the first one is doing anything interesting.
There's ambiguity in the http_client.request() call on whether or not the caller
is responsible for freeing the header. It depends how request() fails, and it's
impossible for the caller to know. This needs a fundamental fix, but, in the
meantime, we get to pick between: a possible leak or a double free.
This commit opts for a possible leak. Why? Because overwhelmingly, if request
fails, it'll fail at a point where it will handle the free. In those cases
where it doesn't then the system is probably in trouble anyways (OOM).
(Also, as I was debugging, I noticed that the function.src() debug helper
wasn't working, so I fixed it).
`releaseRef` can free the XHR instance, so anything we want to set, has to
happen before then. (It might seem like the set is meaningful if we're just
going to destroy the instance, but `releaseRef` might also _not_ destroy the
instance, and the guard is for those cases).
Just improves the log a little to try and capture the name of what was
constructed. We see this often in WPT tests, which is probably WPT testing the
illegal constructor behavior, but it's hard to be sure. Having the name logged
gives us enough detail to see if the behavior is correct or not.
Also, downgraded the log level from warn -> info.
This is a follow up to https://github.com/lightpanda-io/browser/pull/2365 which
implements a different fix with different performance tradeoffs.
To recap, #2365 fixed an issue where the frame used to track the dom version
was the frame here the JS was being executed, not necessarily where the node
existed. This would cause the caching to break. #2365 fixed this by always
using the node's frame. But this is relatively expensive, requiring that we
find the owning document of a node in order to get its frame. This traversal
has to happen on every dom manipulation (to update the version) and every cached
read (to verify the dom version hasn't changed). We _could_ store the owning
frame on every Node, but that's relatively memory expensive to do.
This commit moves the dom version from the Frame to the Page. This allows us
to use the executing Frame to get the page (`frame._page`) which is much faster
than traversing the parent nodes to find the document. The downside is that a
mutation in Frame1 will invalidate the cache in Frame2. Given that many sites
have a single frame, this seems like a clear win.
buildJSONVersionResponse read the port from app.config.port(), which
is the configured value (still 0 with --port 0). Pass the OS-assigned
port from the bound address instead, so webSocketDebuggerUrl matches
the actual listener.
The integration test previously asserted a literal ws://127.0.0.1:9222/
that happened to match the test config's default port but not the
harness's hardcoded bind address (9583) — drop the URL assertion; the
buildJSONVersionResponse unit test already covers URL formatting.
XHR, for example, needs the context to properly shutdown. It get the page via
exec.context.page. Now, we could store the page on the XHR instance, sure. But
the point is the js.Context should never need the HttpClient, so it should
always be safe to abort inflight requests before canceling the context.
Fixes some flaky WPT crashes.
This re-implements the CDP navigate action fast-path when a page is in a
waiting state. It was removed for https://github.com/lightpanda-io/browser/pull/2297.
I believe this fast-path is still safe to do given that a page in _waiting has
had no navigation event yet and thus has nothing to preserve. The upside is
being able to re-use the existing [bare] v8::Context.
Per DOM §2.9 step 4 substep 8 ("Inner invoke"), an exception thrown by a
listener callback must be reported to the realm's global, not propagated
out of `dispatchEvent`. Subsequent listeners on the same target and the
rest of the capture/bubble walk must still run.
Both `EventManager.dispatchPhase` (DOM targets) and
`EventManagerBase.dispatchDirect` (Window/XHR/AbortSignal/etc.) wrapped
the V8 callback invocation in raw `try`, so a listener throw aborted the
whole dispatch and surfaced as an uncaught exception at the
`Runtime.evaluate` boundary. The inline-handler invocation a few lines
above already used `catch |err| log.warn(...)`; this just extends the
same shape to the listener switch.
Closes#2367