Commit Graph

63 Commits

Author SHA1 Message Date
Nikolay Govorov
ceee9cfb10 Move Handles stuff to Network 2026-05-26 15:28:54 +01:00
Karl Seguin
e4171bc694 Merge pull request #2501 from lightpanda-io/remove_reentrency_teardown_protection
Remove reentrency teardown protection
2026-05-21 10:15:26 +08:00
Pierre Tachoire
6e6b3caf96 Merge pull request #2479 from navidemad/accessibility-query-ax-tree
Implement Accessibility.queryAXTree CDP method (and fix latent frame-binding bug)
2026-05-20 13:59:35 +02:00
Karl Seguin
a9cf87e0b0 Remove reentrency teardown protection
This largely reverts 92607ad765 (captured in PR:
https://github.com/lightpanda-io/browser/pull/2398).

https://github.com/lightpanda-io/browser/pull/2495 introduces protection against
execution arbitrary CDP command during JavaScript callbacks. Claude initially
made the case for keeping the existing code as a safety net, but sycophanted
when I pushed by.

My reason for removing it is that it isn't a low-maintenance guard. It's a flag
that serves a real purpose (ensuring 1 JS script is finished before executing
another one), that has been expended to solve these issues. It needs to be set
(and reverted) at every callsite that makes a blocking call, and it needs to be
checked (recursively across all frames) in any place that can teardown the page/
frame.

Claude called the allowlist "load-bearing in a non-obvious way", but I think
it's purpose built specifically for this case. Extended the comment atop
`allowDuringSyncWait` so that future-selves remember this.
2026-05-20 15:08:18 +08:00
Navid EMAD
814ca8ab3f accessibility: unify query+tree writers, route objectId via dom.getNode
Fold QueryWriter into Writer behind an Opts.filter. Tree mode is unchanged
(filter=null); query mode walks the full subtree (including AX-ignored
nodes per the queryAXTree spec) and emits the flat-match shape. Shared
resolveRole helper handles label-promotion for both paths so the two
can't drift.

Drop the "objectId not yet supported" carve-out: queryAXTree now reuses
dom.getNode, which already resolves nodeId/backendNodeId/objectId.
2026-05-19 20:43:54 +02:00
Karl Seguin
97c8ca3832 when work is done, don't keep polling, return to process it 2026-05-19 22:39:48 +08:00
Karl Seguin
875c147783 Main/Network reads CDP socket
Previously, the CDP socket was added to the worker's multi and fully owned
by the worker. While this is simple, it introduced some issues:

1 - Cannot detect a disconnected client during JS processing ( for(;;) )

2 - A blocked worker can cause back-pressure that blocks the client. This can
    cause a deadlock if the worker is blocked waiting for a CDP message

In addition to these 2 problems, there was 1 other serious CDP-related issue:
arbitrary CDP messages could be processed during JavaScript callback. For
example, a Worker calls importScripts while request interception is enabled,
this requires us to tick the HttpClient waiting for the interception response.
But, a client could sent Target.closeTarget, which we'd process and delete the
frame..all while importScripts is still blocked. Assuming importScripts unblocks
everything is a big UAF since the frame (and its workers) were cleared from
closeTarget.

The CDP socket is now read from the network (main) thread and an OTP-style
mailbox is used. The network thread posts message to the Worker's inbox and
signals it to wakeup. This solves #1 and #2. It doesn't directly solve the
reentrancy issue, but it provides the foundation. Specifically, in introduces
a queue for of CDP message and more control over when/how that queue is
processed. At "safe points" (Runner.tick, HttpClient.tick), any message can
be processed. But, when inside a JavaScript callback, we can process only non-
destructive/mutating message. Specifically, we can process only messages related
to request interception.
2026-05-19 20:52:21 +08:00
Karl Seguin
8ef6084fdb Re-organization CDP connection
network/WsConnection.zig was poorly named. It didn't represent a generic WS
connection, but rather a CDP-specific connection. This splits the generic WS
logic into network/WS.zig and the CDP-specific details in cdp/Connection.zig.

Some of the connection management in the Server has also been simplified.
2026-05-19 10:08:22 +08:00
Karl Seguin
d926291241 Merge pull request #2467 from lightpanda-io/http_transfer
Cleanup HttpClient.Transfer
2026-05-16 08:52:12 +08:00
Navid EMAD
b9601be45e accessibility: bind AX writers to the node's owning frame
axnodeWriter and axnodeQueryWriter both used session.currentFrame(),
but the root node may belong to a different frame (cross-frame query
from a parent context, iframe content). Name resolution (Label lookup
against ownerDocument) and visibility checks (frame._style_manager)
are per-frame, so the writer needs to bind to the node's owning frame.

Uses the existing Node.ownerFrame(fallback) helper. Fallback is
currentFrame for the orphan/detached-node case.

Also corrects a pre-existing latent bug in getFullAXTree where the
writer ignored the resolved frameId and used currentFrame instead.
2026-05-15 22:26:48 +02:00
Navid EMAD
5f2d897f16 accessibility: implement queryAXTree CDP method
Adds Accessibility.queryAXTree for finding AX nodes by role and/or
accessible name without serializing the full tree. Motivated by
agentic / MCP automation workloads where getFullAXTree round-trips
multi-MB JSON on complex pages.

QueryWriter walks the DOM subtree rooted at the requested node,
reuses the existing role + name resolution + ignore logic from
AXNode, and emits a flat array of matches. Reuses the same
VisibilityCache + LabelByForIndex + temp arena pattern as
axnodeWriter, so no extra retained state.

MVP limitations, each a small follow-up PR:
- objectId param returns a specific not-yet-supported error
- matches emit empty properties/childIds and no parentId
- frameId not parsed; queries the current frame
2026-05-15 21:38:17 +02:00
Pierre Tachoire
3803a1f8c6 webmcp: use value.jsonStringify for JSON write 2026-05-15 11:17:53 +02:00
Pierre Tachoire
7c5a3b211f cdp: cancel inflight webmcp invocation on bc deinit 2026-05-15 08:50:48 +02:00
Pierre Tachoire
5e0901aaf7 cdp: fix invalid arena usage in webmcp 2026-05-15 08:50:47 +02:00
Pierre Tachoire
3ef6e57d58 cdp: adjust invocation id usage for webmcp 2026-05-15 08:50:47 +02:00
Pierre Tachoire
c23d0f4f35 cdp: implement webMCP domain 2026-05-15 08:50:46 +02:00
Karl Seguin
a5162bea8f Cleanup HttpClient.Transfer
This is just moving fields around. The end result is that there's a
`transfer.req` and a `transfer.res`.

On the Request side, we use to have a nested `params: RequestParam` resulting
in a lot of `transfer.req.params.url`. This is now `transfer.req.url`. On the
Response side, we had the exact opposite: response fields splattered directly
in the transfer, `transfer.response_header`. This is now `transfer.res.header`.

There is now an HttpClient.Response, which is the actual final response (which
could be for a transfer or something else, e.g the cache). And an
HttpClient.Transfer.Response which captures the inflight response data (and is
one of the polymorphic variants of the HttpClient.Response). Probably still not
ideal, but I'm not sure how to make it cleaner, and even if this is just an
intermediary step, I consider it an small win.
2026-05-15 12:55:47 +08:00
Muki Kiboigo
ac863c7e2b add Network.requestServedFromCache 2026-05-13 21:47:47 -07:00
Pierre Tachoire
198c4e5a0f Merge pull request #2444 from lightpanda-io/useless-code
cdp: remove dead code
2026-05-13 15:36:16 +02:00
Pierre Tachoire
36b55339cd cdp: reset browser context arena when bc is removed 2026-05-13 11:26:09 +02:00
Pierre Tachoire
403fe0d293 cdp: remove dead code 2026-05-13 11:18:05 +02:00
Pierre Tachoire
854eb6a62d Merge pull request #2339 from lightpanda-io/cdp-console
cdp: implement Console
2026-05-13 08:28:01 +02:00
Karl Seguin
393141e472 pass arena into handlers (consistent with other handlers) 2026-05-13 11:51:59 +08:00
Karl Seguin
82a4fc752b HttpClient Improvements
1 - Track owner of a request (for simpler / more accurate abort (TBD))

2 - Create Transfer upfront, make everything work on Transfer (not Request)
    This helps remove ambiguity about cleanup and simplifies layers. For example
    Robots request is just another normal request, not a special case. This gives
    everything a stable address (the *Transfer which can be looked up by id)
2026-05-12 19:26:24 +08:00
Scott Taylor
92607ad765 Defer page teardown while worker scripts are evaluating
Worker scripts can call importScripts(), which performs a synchronous
HTTP request via HttpClient.syncRequest. To stay responsive during a
long fetch, syncRequest pumps the CDP socket (cdp.blocking_read) while
waiting. If a CDP message such as Target.closeTarget arrives on that
socket mid-fetch, the previous code path tore down the page
immediately:

    Worker JS -> importScripts -> syncRequest -> blocking_read
      -> CDP dispatch -> Target.closeTarget
      -> Session.removePage -> Page.deinit -> Frame.deinit
      -> Worker.deinit (frees worker arena + identity_map)

When control unwound back into the worker's eval, the next operation
that hit ctx.identity.identity_map.getOrPut dereferenced the freed
metadata pointer and segfaulted (sometimes immediately, sometimes a
few connections later as the arena got recycled).

Reproducer: any URL that loads dedicated workers calling importScripts
during initial eval, driven via puppeteer-core's connectOverCDP. The
allbirds.com product page (which loads ~8 web-pixel workers each
calling importScripts) reliably triggered it within ~10 connections.

Session.removePage already deferred when the frame's own
ScriptManager.is_evaluating was set; that guard never tripped because
worker scripts don't go through the frame's ScriptManager. Fix:

  * Worker.loadInitialScript now sets the worker's own
    _worker_scope._script_manager.is_evaluating around the eval, with
    save/restore so nested worker evals compose correctly.

  * WorkerGlobalScope.importScript also sets its own
    _script_manager.is_evaluating around the syncRequest +
    runMacrotasks. The typical caller (Worker.loadInitialScript)
    already sets this around its outer eval, so the outer guard
    usually covers us; the inner mark is defense-in-depth for callers
    that reach importScripts() from a setTimeout / microtask outside
    the loadInitialScript scope.

  * New Frame.anyScriptEvaluating method walks the frame tree (frame
    ScriptManager + every worker's ScriptManager + child frames) and
    returns true if any is mid-eval. Session.removePage and
    CDP.disposeBrowserContext use this in place of the frame-only
    check, deferring teardown until all evals unwind. Final cleanup
    happens at CDP.deinit on connection close, matching the existing
    deferred-teardown contract.

Verified by running the puppeteer-core repro back-to-back against a
single Lightpanda serve; all returned 200 with the right title, no
UAF crashes (was previously crashing within 1-10 runs). All 521 unit
tests still pass.

Note: a separate, pre-existing latent V8 issue surfaces under stress
on this same code path. After many iterations a Runtime.evaluate
promise tracked by V8's inspector PromiseHandlerTracker is discarded
during garbage collection's first-pass weak callbacks; the discard
sends a failure response which triggers v8::String::NewFromOneByte,
hitting the debug-only assertion AllowHeapAllocation::IsAllowed() in
heap-allocator-inl.h:79 (no allocations allowed during weak callbacks).
This reproduces on a baseline build of this PR commit and on a
baseline build of just the original two-line is_evaluating fix \u2014
i.e. it is not introduced by the deferral logic. The deferral makes
it more visible because inspector callbacks now live longer before
teardown, so they are more likely to be alive during a GC. Tracking
this as a follow-up; the fix here still resolves the UAF that was
crashing the server immediately.
2026-05-09 17:26:41 -04:00
Pierre Tachoire
d6c9a5fb83 cdp: add runtime.consoleAPICalled 2026-05-06 18:34:30 +02:00
Pierre Tachoire
595b774f1d cdp: implement Console.messageAdded event 2026-05-06 18:34:29 +02:00
Nikolay Govorov
e5a9f8ba2e Fix ony more crash 2026-05-04 18:12:47 +01:00
Nikolay Govorov
9a312a4177 Refactor server/client/cdp structure 2026-05-04 16:41:22 +01:00
Pierre Tachoire
c3fe5346c2 cdp: add console enable/disable commands 2026-05-04 14:59:17 +02:00
Pierre Tachoire
080e1e6415 cdp: rename Audit into Audits 2026-05-04 12:42:55 +02:00
Pierre Tachoire
cddabe60f5 cdp: avoid request id conflict between LID- and REQ-
Use distinct key for laoder id and request id based captured response.
2026-05-04 08:59:53 +02:00
Pierre Tachoire
11172a341a cdp: use loader_id as captured response key for documents 2026-05-04 08:59:50 +02:00
Navid EMAD
fd2f26a065 Merge remote-tracking branch 'origin/main' into fix-a3-handle-javascript-dialog 2026-04-29 00:57:03 +02:00
Muki Kiboigo
85a5c0f927 decrement intercepted and properly deinit on BrowserContext deinit 2026-04-28 07:01:43 -07:00
Muki Kiboigo
3db3281e8e working authentication with InterceptionLayer 2026-04-28 07:01:40 -07:00
Muki Kiboigo
9c826159a0 crude InterceptionLayer 2026-04-28 07:01:40 -07:00
Navid EMAD
1d806475c4 page: make handleJavaScriptDialog drive confirm/prompt return values
Page.handleJavaScriptDialog previously responded -32000 "No dialog is
showing" regardless of whether a dialog was open, leaving CDP clients
no way to influence the JS-side return value of confirm() / prompt().
PR #2085 wired up the Page.javascriptDialogOpening event but explicitly
deferred the return-value override since true Chrome semantics require
suspending V8 mid-execution.

Add a pre-arm model that fits the auto-dismiss architecture without
runtime suspension: handleJavaScriptDialog stashes {accept, promptText}
on the BrowserContext; when the next JS dialog dispatches the
javascript_dialog_opening notification, the listener pops the stash and
fills it into the dispatch's response output param so Window.confirm /
prompt return the CDP client's choice. Without a pre-arm, headless
auto-dismiss values from PR #2085 are preserved (confirm->false,
prompt->null, alert->void).

Closes #2260
2026-04-27 07:08:01 +02:00
Karl Seguin
550fb58f3f Introduce Page (container)
Follow up to https://github.com/lightpanda-io/browser/pull/2200

This change is actually pretty mundane, but a bunch of files that used to
take a *Session (e.g. every WebAPI releaseRef and deinit) now take a *Page.

This aims to separate the 2 lifetimes currently managed by Session by moving
the "Page" lifetime to a dedicated container: Page. Ultimately, the goal is to
remove the 1-page-per-session limit of the current design. Not to explicitly
support multiple pages per session (though, that's more possible now), but
in order to better emulate Chrome where, during a navigation event, the old and
new page both exist.
2026-04-23 15:48:13 +08:00
Karl Seguin
73320e163d Add placeholder handlers for Audit enable/disable CDP methods
Might help with: https://github.com/lightpanda-io/browser/issues/2177

I say "might" because there are a 2 more methods in Audit which I haven't
implemented. This is just the most basic placeholder for now.
2026-04-23 09:19:49 +08:00
Karl Seguin
2275416505 Page -> Frame
This is to pave the way for introducing a new "Page" container, which will take
over the page lifecycle currently burdening Session. The ultimate goal of that
is to allow the Session to have multiple pages (mostly for better transitions
between pages), which is hard to do now since the Session has so much state.

This rename was aggressive, e.g. currentPage() -> currentFrame() so that, when
the new Page container is added, you won't see "currentPage()" and wonder:

  "Does 'currentPage' mean the new Page container, or the Frame (which
  used to be called Page)".
2026-04-22 08:42:18 +08:00
Karl Seguin
c159be503a Merge pull request #2194 from lightpanda-io/import_log
Change all @import("...../log.zig") to const log = lp.log;
2026-04-20 15:07:15 +08:00
Adrià Arrufat
983e592b43 cdp: use page arena pool for AXNode writer 2026-04-20 07:42:08 +02:00
Karl Seguin
2d20e57f80 Change all @import("...../log.zig") to const log = lp.log;
@import("lightpanda") where needed.

Would also like to do this for String, Page, Session and js which all stand out
as types that are use across the codebase.

I know that a few devs are doing this in new work and I haven't heard anyone
voice an objection.
2026-04-20 12:40:04 +08:00
Adrià Arrufat
b42251d750 ax: route AXNode.Writer scratch allocations through a dedicated arena 2026-04-17 12:53:43 +02:00
Adrià Arrufat
36c1218486 ax: add lazy label index for name resolution 2026-04-17 12:20:10 +02:00
Adrià Arrufat
cee72cabb9 cdp: improve AX tree visibility and label resolution
Prunes hidden subtrees from the accessibility tree and implements
accessible name resolution via labels. Adds the `labels` property
to labellable HTML elements.
2026-04-17 08:33:13 +02:00
Pierre Tachoire
8de5267cd0 Merge pull request #2169 from lightpanda-io/feat/cookies-file
Feat/cookies file
2026-04-16 08:21:53 -04:00
Karl Seguin
3ca1f230b9 Serialize sameSite
Tweak ergonomics (public functions log internally and are infallible). Use
readFileAlloc directly. Fix possible memory leak with cookie arena - I don't
think you can make a copy of the arena, and then dupe with the original.
2026-04-16 10:35:34 +08:00
Pierre Tachoire
a24fcc6a5c use session arg to load cookies from file 2026-04-15 10:29:53 -04:00