This largely reverts 92607ad765 (captured in PR:
https://github.com/lightpanda-io/browser/pull/2398).
https://github.com/lightpanda-io/browser/pull/2495 introduces protection against
execution arbitrary CDP command during JavaScript callbacks. Claude initially
made the case for keeping the existing code as a safety net, but sycophanted
when I pushed by.
My reason for removing it is that it isn't a low-maintenance guard. It's a flag
that serves a real purpose (ensuring 1 JS script is finished before executing
another one), that has been expended to solve these issues. It needs to be set
(and reverted) at every callsite that makes a blocking call, and it needs to be
checked (recursively across all frames) in any place that can teardown the page/
frame.
Claude called the allowlist "load-bearing in a non-obvious way", but I think
it's purpose built specifically for this case. Extended the comment atop
`allowDuringSyncWait` so that future-selves remember this.
Fold QueryWriter into Writer behind an Opts.filter. Tree mode is unchanged
(filter=null); query mode walks the full subtree (including AX-ignored
nodes per the queryAXTree spec) and emits the flat-match shape. Shared
resolveRole helper handles label-promotion for both paths so the two
can't drift.
Drop the "objectId not yet supported" carve-out: queryAXTree now reuses
dom.getNode, which already resolves nodeId/backendNodeId/objectId.
Previously, the CDP socket was added to the worker's multi and fully owned
by the worker. While this is simple, it introduced some issues:
1 - Cannot detect a disconnected client during JS processing ( for(;;) )
2 - A blocked worker can cause back-pressure that blocks the client. This can
cause a deadlock if the worker is blocked waiting for a CDP message
In addition to these 2 problems, there was 1 other serious CDP-related issue:
arbitrary CDP messages could be processed during JavaScript callback. For
example, a Worker calls importScripts while request interception is enabled,
this requires us to tick the HttpClient waiting for the interception response.
But, a client could sent Target.closeTarget, which we'd process and delete the
frame..all while importScripts is still blocked. Assuming importScripts unblocks
everything is a big UAF since the frame (and its workers) were cleared from
closeTarget.
The CDP socket is now read from the network (main) thread and an OTP-style
mailbox is used. The network thread posts message to the Worker's inbox and
signals it to wakeup. This solves #1 and #2. It doesn't directly solve the
reentrancy issue, but it provides the foundation. Specifically, in introduces
a queue for of CDP message and more control over when/how that queue is
processed. At "safe points" (Runner.tick, HttpClient.tick), any message can
be processed. But, when inside a JavaScript callback, we can process only non-
destructive/mutating message. Specifically, we can process only messages related
to request interception.
network/WsConnection.zig was poorly named. It didn't represent a generic WS
connection, but rather a CDP-specific connection. This splits the generic WS
logic into network/WS.zig and the CDP-specific details in cdp/Connection.zig.
Some of the connection management in the Server has also been simplified.
axnodeWriter and axnodeQueryWriter both used session.currentFrame(),
but the root node may belong to a different frame (cross-frame query
from a parent context, iframe content). Name resolution (Label lookup
against ownerDocument) and visibility checks (frame._style_manager)
are per-frame, so the writer needs to bind to the node's owning frame.
Uses the existing Node.ownerFrame(fallback) helper. Fallback is
currentFrame for the orphan/detached-node case.
Also corrects a pre-existing latent bug in getFullAXTree where the
writer ignored the resolved frameId and used currentFrame instead.
Adds Accessibility.queryAXTree for finding AX nodes by role and/or
accessible name without serializing the full tree. Motivated by
agentic / MCP automation workloads where getFullAXTree round-trips
multi-MB JSON on complex pages.
QueryWriter walks the DOM subtree rooted at the requested node,
reuses the existing role + name resolution + ignore logic from
AXNode, and emits a flat array of matches. Reuses the same
VisibilityCache + LabelByForIndex + temp arena pattern as
axnodeWriter, so no extra retained state.
MVP limitations, each a small follow-up PR:
- objectId param returns a specific not-yet-supported error
- matches emit empty properties/childIds and no parentId
- frameId not parsed; queries the current frame
This is just moving fields around. The end result is that there's a
`transfer.req` and a `transfer.res`.
On the Request side, we use to have a nested `params: RequestParam` resulting
in a lot of `transfer.req.params.url`. This is now `transfer.req.url`. On the
Response side, we had the exact opposite: response fields splattered directly
in the transfer, `transfer.response_header`. This is now `transfer.res.header`.
There is now an HttpClient.Response, which is the actual final response (which
could be for a transfer or something else, e.g the cache). And an
HttpClient.Transfer.Response which captures the inflight response data (and is
one of the polymorphic variants of the HttpClient.Response). Probably still not
ideal, but I'm not sure how to make it cleaner, and even if this is just an
intermediary step, I consider it an small win.
1 - Track owner of a request (for simpler / more accurate abort (TBD))
2 - Create Transfer upfront, make everything work on Transfer (not Request)
This helps remove ambiguity about cleanup and simplifies layers. For example
Robots request is just another normal request, not a special case. This gives
everything a stable address (the *Transfer which can be looked up by id)
Worker scripts can call importScripts(), which performs a synchronous
HTTP request via HttpClient.syncRequest. To stay responsive during a
long fetch, syncRequest pumps the CDP socket (cdp.blocking_read) while
waiting. If a CDP message such as Target.closeTarget arrives on that
socket mid-fetch, the previous code path tore down the page
immediately:
Worker JS -> importScripts -> syncRequest -> blocking_read
-> CDP dispatch -> Target.closeTarget
-> Session.removePage -> Page.deinit -> Frame.deinit
-> Worker.deinit (frees worker arena + identity_map)
When control unwound back into the worker's eval, the next operation
that hit ctx.identity.identity_map.getOrPut dereferenced the freed
metadata pointer and segfaulted (sometimes immediately, sometimes a
few connections later as the arena got recycled).
Reproducer: any URL that loads dedicated workers calling importScripts
during initial eval, driven via puppeteer-core's connectOverCDP. The
allbirds.com product page (which loads ~8 web-pixel workers each
calling importScripts) reliably triggered it within ~10 connections.
Session.removePage already deferred when the frame's own
ScriptManager.is_evaluating was set; that guard never tripped because
worker scripts don't go through the frame's ScriptManager. Fix:
* Worker.loadInitialScript now sets the worker's own
_worker_scope._script_manager.is_evaluating around the eval, with
save/restore so nested worker evals compose correctly.
* WorkerGlobalScope.importScript also sets its own
_script_manager.is_evaluating around the syncRequest +
runMacrotasks. The typical caller (Worker.loadInitialScript)
already sets this around its outer eval, so the outer guard
usually covers us; the inner mark is defense-in-depth for callers
that reach importScripts() from a setTimeout / microtask outside
the loadInitialScript scope.
* New Frame.anyScriptEvaluating method walks the frame tree (frame
ScriptManager + every worker's ScriptManager + child frames) and
returns true if any is mid-eval. Session.removePage and
CDP.disposeBrowserContext use this in place of the frame-only
check, deferring teardown until all evals unwind. Final cleanup
happens at CDP.deinit on connection close, matching the existing
deferred-teardown contract.
Verified by running the puppeteer-core repro back-to-back against a
single Lightpanda serve; all returned 200 with the right title, no
UAF crashes (was previously crashing within 1-10 runs). All 521 unit
tests still pass.
Note: a separate, pre-existing latent V8 issue surfaces under stress
on this same code path. After many iterations a Runtime.evaluate
promise tracked by V8's inspector PromiseHandlerTracker is discarded
during garbage collection's first-pass weak callbacks; the discard
sends a failure response which triggers v8::String::NewFromOneByte,
hitting the debug-only assertion AllowHeapAllocation::IsAllowed() in
heap-allocator-inl.h:79 (no allocations allowed during weak callbacks).
This reproduces on a baseline build of this PR commit and on a
baseline build of just the original two-line is_evaluating fix \u2014
i.e. it is not introduced by the deferral logic. The deferral makes
it more visible because inspector callbacks now live longer before
teardown, so they are more likely to be alive during a GC. Tracking
this as a follow-up; the fix here still resolves the UAF that was
crashing the server immediately.
Page.handleJavaScriptDialog previously responded -32000 "No dialog is
showing" regardless of whether a dialog was open, leaving CDP clients
no way to influence the JS-side return value of confirm() / prompt().
PR #2085 wired up the Page.javascriptDialogOpening event but explicitly
deferred the return-value override since true Chrome semantics require
suspending V8 mid-execution.
Add a pre-arm model that fits the auto-dismiss architecture without
runtime suspension: handleJavaScriptDialog stashes {accept, promptText}
on the BrowserContext; when the next JS dialog dispatches the
javascript_dialog_opening notification, the listener pops the stash and
fills it into the dispatch's response output param so Window.confirm /
prompt return the CDP client's choice. Without a pre-arm, headless
auto-dismiss values from PR #2085 are preserved (confirm->false,
prompt->null, alert->void).
Closes#2260
Follow up to https://github.com/lightpanda-io/browser/pull/2200
This change is actually pretty mundane, but a bunch of files that used to
take a *Session (e.g. every WebAPI releaseRef and deinit) now take a *Page.
This aims to separate the 2 lifetimes currently managed by Session by moving
the "Page" lifetime to a dedicated container: Page. Ultimately, the goal is to
remove the 1-page-per-session limit of the current design. Not to explicitly
support multiple pages per session (though, that's more possible now), but
in order to better emulate Chrome where, during a navigation event, the old and
new page both exist.
This is to pave the way for introducing a new "Page" container, which will take
over the page lifecycle currently burdening Session. The ultimate goal of that
is to allow the Session to have multiple pages (mostly for better transitions
between pages), which is hard to do now since the Session has so much state.
This rename was aggressive, e.g. currentPage() -> currentFrame() so that, when
the new Page container is added, you won't see "currentPage()" and wonder:
"Does 'currentPage' mean the new Page container, or the Frame (which
used to be called Page)".
@import("lightpanda") where needed.
Would also like to do this for String, Page, Session and js which all stand out
as types that are use across the codebase.
I know that a few devs are doing this in new work and I haven't heard anyone
voice an objection.
Prunes hidden subtrees from the accessibility tree and implements
accessible name resolution via labels. Adds the `labels` property
to labellable HTML elements.
Tweak ergonomics (public functions log internally and are infallible). Use
readFileAlloc directly. Fix possible memory leak with cookie arena - I don't
think you can make a copy of the arena, and then dupe with the original.