Commit Graph

612 Commits

Author SHA1 Message Date
Karl Seguin
24e17b6f21 Merge pull request #2130 from lightpanda-io/arena_pool_buckets
Add arena buckets to ArenaPool
2026-04-10 20:33:19 +08:00
Karl Seguin
ddf614a9d5 Add arena buckets to ArenaPool
ArenaPool previously maintained up to 512 16KB buckets. The 16KB retention is
small for things like XHR and scripts, but increasing it to something more
reasonably, like 128KB, would use up to 8x more memory.

This commit adds 4 buckets: 1KB, 4KB, 16KB and 128KB. Callers can request a
tiny, small, medium or large bucket. We end up using less memory peak memory
and less allocations.

Furthermore, callers can request a specific size. This is particularly useful
for WebSocket or Blob where the size could vary greatly (so we'd likely default
to a large bucket), but that could needlessly use up a large arena.

The bucket sizes were derived from analyzing allocations. A significant number
of allocations were very small. Things like ScheduleCallback and
FinalizerCallback are always less than 1K and can be generated in the thousands.
The 16KB retention was wasteful in these cases...better to have a large number
of 1K pools, so that we can have a handful of very large buffers.
2026-04-10 19:09:18 +08:00
Karl Seguin
05229fdc53 Use the document's charset to determine if/how to encode querystring
Whenever we resolve a URL, say from `anchor.href`, we should consider the
document's charset when encoding the querystring. This probably isn't the
most important feature, but it makes tens of thousands of WPT cases pass, e.g

/encoding/legacy-mb-tchinese/big5/big5-encode-href-errors-han.html?3001-4000 and
/encoding/legacy-mb-japanese/euc-jp/eucjp-encode-href-errors-han.html?17001-18000

DOM elements previous called `URL.resolveURL(...)`. They now call
`self.asNode().resolveURL(...)`, where `Node#resolveURL` will provide the
document's charset.
2026-04-10 16:47:42 +08:00
Karl Seguin
8eaeafe16c Fix a lot of typos.
I used https://github.com/crate-ci/typos, it worked well.

Also, make sure cdp-initiated KeyboardEvent is freed when no element is in focus
2026-04-10 06:51:10 +08:00
Karl Seguin
b98eb1292d Merge pull request #2085 from tmchow/feat/2082-handle-javascript-dialog
feat: emit Page.javascriptDialogOpening CDP events for JS dialogs
2026-04-07 08:54:36 +08:00
Trevin Chow
7208934bda fix: return CDP error from handleJavaScriptDialog instead of silent no-op
Dialogs auto-dismiss in headless mode, so there is no pending dialog
by the time the CDP client sends Page.handleJavaScriptDialog. Return
an explicit error so the client knows the action had no effect.
2026-04-06 11:08:27 -07:00
Trevin Chow
b33bb54442 fix: propagate keyUp and char keyboard events to JS listeners
dispatchKeyEvent only handled keyDown, returning early for keyUp,
rawKeyDown, and char types. This meant JS keyup and keypress
listeners never fired via CDP.

Now keyUp dispatches as "keyup" and char dispatches as "keypress".
rawKeyDown remains a no-op (Chrome-internal, not used for JS dispatch).

Fixes #2080
Ref #2043
2026-04-03 17:08:09 -07:00
Trevin Chow
95f80c9645 feat: emit Page.javascriptDialogOpening CDP events for JS dialogs
window.alert(), confirm(), and prompt() now dispatch a
javascript_dialog_opening notification that the CDP layer
forwards as a Page.javascriptDialogOpening event. This enables
Puppeteer's page.on('dialog') to fire when JS dialogs open.

Also adds Page.handleJavaScriptDialog as a CDP method. Dialogs
still auto-dismiss in headless mode (alert is void, confirm
returns false, prompt returns null), so handleJavaScriptDialog
is an acknowledgement rather than a blocking gate.

Changes:
- Notification.zig: add JavascriptDialogOpening event type
- CDP.zig: register listener, forward to page domain
- page.zig: handleJavaScriptDialog handler + event emitter
- Window.zig: alert/confirm/prompt dispatch the notification

Fixes #2082
Ref #2043
2026-04-03 16:59:21 -07:00
Karl Seguin
0604056f76 Improve network naming consistency
1.
Runtime.zig -> Network.zig (especially since most places imported it as
`const Network = @import("Runtime.zig")`

2.
const net_http = @import(...) -> const http = @import(...)
2026-04-01 18:46:03 +08:00
Karl Seguin
4ad8282e75 Merge pull request #2047 from lightpanda-io/fancy-wait
Add --wait-selector, --wait-script and --wait-script-file options to …
2026-03-31 20:59:48 +08:00
Karl Seguin
26653120fa Removing remaining CDP generic
Follow up to https://github.com/lightpanda-io/browser/pull/1990 which makes
both BrowserContext and Command non-generic.
2026-03-31 16:53:58 +08:00
Adrià Arrufat
008235222b SemanticTree: reorder getNodeDetails params 2026-03-31 07:29:33 +02:00
Karl Seguin
ab6c63b24b Add --wait-selector, --wait-script and --wait-script-file options to fetch
These new optional parameter run AFTER --wait-until, allowing the (imo) useful
combination of `--wait-until load --wait-script "report.complete === true"`.
However, if `--wait-until` IS NOT specified but `--wait-selector/script` IS,
then there is no default wait and it'll just check the selector/script. If
neither `--wait-selector` or `--wait-script/--wait-script-file` are specified
 then  `--wait-until` continues to default to `done`.

These waiters were added to the Runner, and the existing Action.waitForSelector
now uses the runner's version. Selector querying has been split into distinct
parse and query functions, so that we can parse once, and query on every tick.

We could potentially optimize --wait-script to compile the script once and call
it on each tick, but we'd have to detect page navigation to recompile the script
in the new context. Something I'd rather optimize separately.
2026-03-31 12:30:46 +08:00
Adrià Arrufat
16f17ead9a Merge pull request #2045 from lightpanda-io/semantic-tree-node-details
SemanticTree: Add nodeDetails tool
2026-03-31 05:28:24 +02:00
Karl Seguin
568fa25add Remove DOMContentLoaded and Loaded events from page_navigated
These were moved to their own distinct events, and should have been removed from
here.
2026-03-31 06:56:01 +08:00
Karl Seguin
752184b12b Improve/Fix CDP navigation event order
These changes all better align with chrome's event ordering/timing.

There are two big changes. The first is that our internal page_navigated event,
which is kind of our heavy hitter, is sent once the header is received as
opposed to (much later) on document load. The main goal of this internal event
is to trigger the "Page.frameNavigated" CDP event which is meant to happen
once the URL is committed, which _is_ on header response.

To accommodate this earlier trigger, new explicit events for DOMContentLoaded
and load have be added.

This drastically changes the flow of events as things go from:
Start Page Navigation
Response Received
  Start Frame Navigation
  Response Received
  End Frame Navigation
End Page Navigation
context clear + reset
DOMContentLoaded
Loaded

TO:
Start Page Navigation
Response Received
End Page Navigation
context clear + reset
Start Frame Navigation
Response Received
End Frame Navigation
DOMContentLoaded
Loaded

So not only does it remove the nesting, but it ensures that the context are
cleared and reset once the main page's navigation is locked in, and before any
frame is created.
2026-03-31 06:56:00 +08:00
Adrià Arrufat
9c8fe9b20f SemanticTree: Add nodeDetails tool
Adds a tool to retrieve detailed node metadata and updates the
semantic tree to track and display the disabled state of elements.
2026-03-30 16:38:23 +02:00
Karl Seguin
75dc4d5b0e Merge pull request #2031 from lightpanda-io/cdp-add-script-to-evaluate-on-new-document
Cdp add script to evaluate on new document
2026-03-30 11:16:39 +08:00
Karl Seguin
0d40aed1b7 zig fmt 2026-03-30 09:32:22 +08:00
Karl Seguin
78cb766298 Log for unimplemented parameter
Wrap script_on_new_document execution in try/catch for better error reporting.

Improve test for script_on_new_document
2026-03-30 09:31:13 +08:00
Nikolay Govorov
649d8d1024 Remove duplication in cookies instalation 2026-03-27 09:49:13 +00:00
Nikolay Govorov
d33edc5697 Fixup cookies management 2026-03-27 09:49:05 +00:00
Nikolay Govorov
16ca8d4b14 Fix cleanup connections in HttpClient 2026-03-27 09:49:03 +00:00
Karl Seguin
ea422075c7 Remove unused imports
And some smaller cleanups.
2026-03-27 12:45:26 +08:00
Navid EMAD
886aa3abba CDP: implement Page.addScriptToEvaluateOnNewDocument
Replace the hardcoded stub with a working implementation that stores
registered scripts and evaluates them in each new document.

Changes:
- Add ScriptOnNewDocument struct and storage list on BrowserContext
- Store scripts with unique identifiers when addScript is called
- Evaluate all registered scripts in pageNavigated, after the execution
  context is created but before frameNavigated/loadEventFired events
  are sent to the CDP client
- Add removeScriptToEvaluateOnNewDocument for cleanup
- Return unique identifiers per the CDP spec (was hardcoded to "1")

Scripts are evaluated with error suppression (warns on failure) to
avoid breaking navigation if a script has issues.

This unblocks CDP clients that rely on auto-injected scripts (polyfills,
monitoring, test helpers) persisting across navigations. Previously
clients had to manually re-inject after every Page.navigate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 19:48:07 +01:00
Adrià Arrufat
7e778a17d6 MCP/CDP: unify node registration
This fixes a bug in MCP where interactive elements were not assigned
a backendNodeId, preventing agents from clicking or filling them. Also
extracts link collection to a shared browser module.
2026-03-26 23:51:43 +09:00
Navid EMAD
c6b0c75106 Address review: use arena.dupeZ for URL copy, add try to testing.context()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 10:09:48 +01:00
Navid EMAD
93485c1ef3 CDP: implement Page.reload
Add `Page.reload` to the CDP Page domain dispatch. Reuses the existing
`page.navigate()` path with `NavigationKind.reload`, matching what
`Location.reload` already does for the JS `location.reload()` API.

Accepts the standard CDP params (`ignoreCache`, `scriptToEvaluateOnLoad`)
per the Chrome DevTools Protocol spec.

The current page URL is copied to the stack before `replacePage()` to
avoid a use-after-free when the old page's arena is freed.

This unblocks CDP clients (Puppeteer, capybara-lightpanda, etc.) that
call `Page.reload` and currently get `UnknownMethod`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 10:09:48 +01:00
Karl Seguin
cf641ed458 Merge pull request #1990 from lightpanda-io/remove_cdp_generic
Remove cdp generic
2026-03-26 07:49:13 +08:00
Karl Seguin
ca41bb5fa2 fix import casing 2026-03-25 17:54:24 +08:00
Karl Seguin
0dd0495ab8 Removes CDPT (generic CDP)
CDPT used to be a generic so that we could inject Browser, Session, Page and
Client. At some point, it [thankfully] became a generic only to inject Client.

This commit removes the generic and bakes the *Server.Client instance in CDP.
It uses a socketpair for testing.

BrowserContext is still generic, but that's generic for a very different reason
and, while I'd like to remove that generic too, it belongs in a different PR.
2026-03-25 17:43:30 +08:00
Adrià Arrufat
8e315e551a forms: extract form node registration logic 2026-03-25 09:30:06 +09:00
Adrià Arrufat
567cd97312 webapi.Element: centralize disabled state logic 2026-03-24 13:13:53 +09:00
Adrià Arrufat
260768463b Merge branch 'main' into osc/feat-mcp-detect-forms 2026-03-24 09:25:47 +09:00
Karl Seguin
5453630955 Merge pull request #1958 from lightpanda-io/runner
Extract Session.wait into a Runner
2026-03-24 07:28:18 +08:00
Pierre Tachoire
a94b0bec93 Merge pull request #1946 from lightpanda-io/cdp-response-body
Encode non-utf8 Network.getResponseBody in base64
2026-03-23 16:46:12 +01:00
Pierre Tachoire
797cae2ef8 encode captured response body during CDP call 2026-03-23 14:26:27 +01:00
Adrià Arrufat
c3a2318eca fix: pass allocator as first parameter in forms.zig 2026-03-23 15:27:49 +09:00
Adrià Arrufat
4f1b499d0f zig fmt 2026-03-23 13:52:28 +09:00
Karl Seguin
c9bc370d6a Extract Session.wait into a Runner
This is done for a couple reasons. The first is just to have things a little
more self-contained for eventually supporting more advanced "wait" logic, e.g.
waiting for a selector.

The other is to provide callers with more fine-grained controlled. Specifically
the ability to manually "tick", so that they can [presumably] do something
after every tick. This is needed by the test runner to support more advanced
cases (cases that need to test beyond 'load') and it also improves (and fixes
potential use-after-free, the lp.waitForSelector)
2026-03-23 12:30:41 +08:00
Adrià Arrufat
4b29823a5b refactor: simplify form extraction and remove const casts 2026-03-23 13:24:21 +09:00
Karl Seguin
a69a22ccd7 Merge pull request #1948 from lightpanda-io/cdp-waitforselector
CDP: add waitForSelector to lp.actions
2026-03-23 10:09:09 +08:00
Adrià Arrufat
a6d2ec7610 refactor: share form node ID serialization between MCP and CDP 2026-03-23 10:18:24 +09:00
Karl Seguin
c1fc2b1301 Merge pull request #1949 from lightpanda-io/1800-fix-startup-frame-id
Fix Page.getFrameId on STARTUP when a browser context and a target exist
2026-03-22 07:14:33 +08:00
Matt Van Horn
78c6def2b1 mcp: add detectForms tool for structured form discovery
Add a detectForms MCP tool and lp.detectForms CDP command that return
structured form metadata from the current page. Each form includes its
action URL, HTTP method, and fields with names, types, required status,
values, select options, and backendNodeIds for use with the fill tool.

This lets AI agents discover and fill forms in a single step instead of
calling interactiveElements, filtering for form fields, and guessing
which fields belong to which form.

New files:
- src/browser/forms.zig: FormInfo/FormField structs, collectForms()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-21 08:40:50 -07:00
Pierre Tachoire
fbc71d6ff7 cdp: handle STARTUP session into Page.getFrameTree gracefully 2026-03-21 16:29:58 +01:00
Adrià Arrufat
e10ccd846d CDP: add waitForSelector to lp.actions
It refactors the implementation from MCP to be reused.
2026-03-22 00:09:02 +09:00
Pierre Tachoire
384b2f7614 cdp: call Page.getFrameTree on startup when possible 2026-03-21 16:07:48 +01:00
Pierre Tachoire
30f387d361 encode captured response depending of the content type 2026-03-21 14:11:06 +01:00
Pierre Tachoire
00d06dbe8c encode all captured responses body in base64 2026-03-21 13:29:58 +01:00