Commit Graph

114 Commits

Author SHA1 Message Date
Navid EMAD
7237c377d3 browser: send Referer on cross-page navigation requests
Anchor click, form submit, and `location.href = ...` assignments queue a
navigation through `Frame.scheduleNavigation`, which then tears down the
originating page and rebuilds the frame in `Session.processRootQueuedNavigation`
before `Frame.navigate` issues the HTTP request. The originator's URL was
discarded with the old arena, so the request went out without a Referer
header — even though the HTML "navigate" algorithm and Fetch §4.5 require
one. `Frame.headersForRequest` (#1449) handled subresource fetches but was
never called from the navigation path.

Capture the originating frame's URL into a new `referer` field on
`NavigateOpts` at scheduling time, dup'd into the `QueuedNavigation` arena
so it survives the page tear-down. `Frame.navigate` adds it as a
`Referer:` header alongside the existing per-request headers. Iframe
initial navigation (`Frame.zig:1282`) also sets `referer = parent.url`
since the parent frame outlives that direct `navigate` call. CDP
`Page.navigate` (`.reason = .address_bar`) and `Page.reload` continue to
omit Referer — matches Chrome.

Closes #2281
2026-04-28 03:21:12 +02:00
Navid EMAD
1b9e8ad46c page: drop POST method/body on redirect so reload doesn't replay it
If a POST navigation gets redirected (302/303), the page that actually
loads is fetched with GET — but Frame._navigated_options still carries
the original POST method, body, and Content-Type. Page.reload would
then re-POST the form data to the redirect target, which is both
incorrect and dangerous (re-submission of credentials, charges, etc.).

Reset method/body/header on _navigated_options inside
frameHeaderDoneCallback whenever response.redirectCount() > 0. The full
spec-correct version would distinguish 307/308 (preserve method) from
301/302/303 (convert POST→GET), but resubmitting form data is the more
dangerous failure mode — conservative reset matches Chrome's practical
behavior on reload.

Also collapse the prev_body/prev_header extraction in doReload to a
single tuple-destructured blk: block (no behavior change).

Tests: new cdp.frame: reload after POST→redirect drops the POST drives
POST /redirect_to_echo → 302 → /echo_method, then Page.reload, asserts
the second request is GET. /redirect_to_echo route added to
testing.zig. The existing reload-replays-POST test still passes (no
redirect, POST is still replayed).
2026-04-27 22:47:55 +02:00
Navid EMAD
5b1452f162 Merge remote-tracking branch 'origin/main' into fix-a6-page-reload-replay-post
# Conflicts:
#	src/cdp/domains/page.zig
2026-04-27 22:39:54 +02:00
Navid EMAD
00c42dec4e http: inherit request URL fragment across fragment-less redirect
Per RFC 7231 §7.1.2, when a 3xx response carries a Location header
without a fragment, the user agent must process the redirect as if
the value inherited the fragment of the request URL. URL.resolve
follows RFC 3986 §5.3 which drops the base fragment, so handleRedirect
now reattaches the original fragment when the resolved target has none.

Closes #2263
2026-04-27 08:04:47 +02:00
Navid EMAD
ea6b228f9d page: replay POST method/body/header on Page.reload
doReload built a NavigateOpts with only url + kind=.reload; method/body/header
defaulted to GET/null/null, so any prior POST navigation regressed to a GET
on reload. The HTML reload navigation re-fetches the document that produced
the current entry, and Chrome replays the same HTTP request that loaded the
page (including method, body, and Content-Type) — Lightpanda dropped all
three.

Retain the prior request body and content-type header in Frame.NavigatedOpts
(duped into the frame arena), and have doReload capture them into the CDP
command's arena before replacePage() frees the old frame. The reload's
frame.navigate call carries the replayed method/body/header so the request
the page was loaded with is the request that runs again.

Closes #2258
2026-04-27 06:28:58 +02:00
Halil Durak
7aca1732fe Merge pull request #2098 from lightpanda-io/nikneym/cli-parser 2026-04-23 11:17:44 +03:00
Karl Seguin
550fb58f3f Introduce Page (container)
Follow up to https://github.com/lightpanda-io/browser/pull/2200

This change is actually pretty mundane, but a bunch of files that used to
take a *Session (e.g. every WebAPI releaseRef and deinit) now take a *Page.

This aims to separate the 2 lifetimes currently managed by Session by moving
the "Page" lifetime to a dedicated container: Page. Ultimately, the goal is to
remove the 1-page-per-session limit of the current design. Not to explicitly
support multiple pages per session (though, that's more possible now), but
in order to better emulate Chrome where, during a navigation event, the old and
new page both exist.
2026-04-23 15:48:13 +08:00
Halil Durak
156cf9b5a4 testing.zig: init directly on .serve 2026-04-22 16:07:04 +03:00
Karl Seguin
2275416505 Page -> Frame
This is to pave the way for introducing a new "Page" container, which will take
over the page lifecycle currently burdening Session. The ultimate goal of that
is to allow the Session to have multiple pages (mostly for better transitions
between pages), which is hard to do now since the Session has so much state.

This rename was aggressive, e.g. currentPage() -> currentFrame() so that, when
the new Page container is added, you won't see "currentPage()" and wonder:

  "Does 'currentPage' mean the new Page container, or the Frame (which
  used to be called Page)".
2026-04-22 08:42:18 +08:00
Karl Seguin
782ed50c83 @import(".....string.zig").String => lp.String
Similar change to https://github.com/lightpanda-io/browser/pull/2194 but for
String.
2026-04-20 15:29:37 +08:00
Karl Seguin
2d20e57f80 Change all @import("...../log.zig") to const log = lp.log;
@import("lightpanda") where needed.

Would also like to do this for String, Page, Session and js which all stand out
as types that are use across the codebase.

I know that a few devs are doing this in new work and I haven't heard anyone
voice an objection.
2026-04-20 12:40:04 +08:00
Karl Seguin
9b93ac690e Migrate more tests to new async (failing in CI)
Also improve test report failure on async failure.
2026-04-17 12:10:35 +08:00
Karl Seguin
aac0a6e6b6 Websocket fixes.
This commit fixes a few serious issues with the Websocket implementation.

1 - libcurl recursive api calls
Creating a Websocket instance from within a libcurl callback results in libcurl
failing with a RecursiveApiCall error. I fixed this more generally by adding a
`ready_queue` which connections can use when the `HttpClient` is performing
actions. Once `perform` ends, this new `ready_queue` is processed. There might
be a more holistic solution to this (we seem to run into RecursiveApiCall
everywhere), but since HttpClient is going through heavy changes, this seemed
like the smallest possible change to fix it.

2 - "load" blocking
Load and IdleNetwork notifications should not block on Websocket connections. To
solve this, `HttpClient` now ha `http_active` and `ws_active` to replace `active`.
Only `http_active` is used for things like "load" triggering.

3 - The above change made the Runner's job more complicated. It used to be
binary: you either have active connections or not. Now there are different types
of active connections. To keep it simple, and I think probably more correct,
the "done-ness" (based on the `wait` parameter) is now independent of active
(or not) network activity. If the page's `load_state == .complete`, then the
`wait == .done` is considered successful, whether or not we have active
connections.

4 - As a consequence of the above, and seemingly unrelated to all of these
changes, a number of html tests now use the "new" robust async framework. Most
of these tests were using the `testing.onload` (aka `testing.eventually`) which
had somewhat...unclear semantics. These tests passed more of a consequence of
how we processed a page and being very simple (e.g. just needing 1 micro or
macrotask tick). But `eventually` never worked for more complicated cases, and
the previous `testing.async` didn't work well. Now, the test runner waits for
.load (which, as per #3, can fire more aggressively), which caused many
`eventually` tests to fail. Moving these tests to the new `async` is more
robust and works with the new aggressive "load".
2026-04-17 11:20:27 +08:00
Karl Seguin
6bf35e1ed4 try to improve test ws shutdown, merge ws tests 2026-04-04 07:00:26 +08:00
Karl Seguin
14dcb7895a Give websockets their own connection pool, improve websocket message logging 2026-04-04 07:00:24 +08:00
Karl Seguin
5733c35a2d WebSocket WebAPI
Uses libcurl's websocket capabilities to add support for WebSocket.

Depends on https://github.com/lightpanda-io/zig-v8-fork/pull/167
Issue: https://github.com/lightpanda-io/browser/issues/1952

This is a WIP because it currently uses the same connection pool used for all
HTTP requests. It would be pretty easy for a page to starve the pool and block
any progress.

We previously stored the *Transfer inside of the easy's private data. We now
store the *Connection, and a Connection now has a `transport` field which is
a union for `http: *Transfer` or `websocket: *Websocket`.
2026-04-04 06:59:28 +08:00
Karl Seguin
ab6c63b24b Add --wait-selector, --wait-script and --wait-script-file options to fetch
These new optional parameter run AFTER --wait-until, allowing the (imo) useful
combination of `--wait-until load --wait-script "report.complete === true"`.
However, if `--wait-until` IS NOT specified but `--wait-selector/script` IS,
then there is no default wait and it'll just check the selector/script. If
neither `--wait-selector` or `--wait-script/--wait-script-file` are specified
 then  `--wait-until` continues to default to `done`.

These waiters were added to the Runner, and the existing Action.waitForSelector
now uses the runner's version. Selector querying has been split into distinct
parse and query functions, so that we can parse once, and query on every tick.

We could potentially optimize --wait-script to compile the script once and call
it on each tick, but we'd have to detect page navigation to recompile the script
in the new context. Something I'd rather optimize separately.
2026-03-31 12:30:46 +08:00
Karl Seguin
cf641ed458 Merge pull request #1990 from lightpanda-io/remove_cdp_generic
Remove cdp generic
2026-03-26 07:49:13 +08:00
Karl Seguin
11ed95290b Improve async tests
testing.async(...) is pretty lame. It works for simple cases, where the
microtask is very quickly resolved, but otherwise can't block the test from
exiting.

This adds an overload to testing.async and leverages the new Runner
https://github.com/lightpanda-io/browser/pull/1958 to "tick" until completion
(or timeout).

The overloaded version of testing.async() (called without a callback) will
increment a counter which is only decremented with the promise is resolved. The
test runner will now `tick` until the counter == 0.
2026-03-26 07:35:05 +08:00
Karl Seguin
0dd0495ab8 Removes CDPT (generic CDP)
CDPT used to be a generic so that we could inject Browser, Session, Page and
Client. At some point, it [thankfully] became a generic only to inject Client.

This commit removes the generic and bakes the *Server.Client instance in CDP.
It uses a socketpair for testing.

BrowserContext is still generic, but that's generic for a very different reason
and, while I'd like to remove that generic too, it belongs in a different PR.
2026-03-25 17:43:30 +08:00
Karl Seguin
c9bc370d6a Extract Session.wait into a Runner
This is done for a couple reasons. The first is just to have things a little
more self-contained for eventually supporting more advanced "wait" logic, e.g.
waiting for a selector.

The other is to provide callers with more fine-grained controlled. Specifically
the ability to manually "tick", so that they can [presumably] do something
after every tick. This is needed by the test runner to support more advanced
cases (cases that need to test beyond 'load') and it also improves (and fixes
potential use-after-free, the lp.waitForSelector)
2026-03-23 12:30:41 +08:00
Karl Seguin
a4cb5031d1 Tweak wait_until option
Small tweaks to https://github.com/lightpanda-io/browser/pull/1896

Improve the wait ergonomics with an Option with default parameter. Revert
page pointer logic to original (don't think that change was necessary).
2026-03-19 20:29:20 +08:00
shaewe180
09327c3897 feat: fetch add wait_until parameter for page loads options
Add `--wait_until` and `--wait_ms` CLI arguments to configure session wait behavior. Updates `Session.wait` to evaluate specific page load states (`load`, `domcontentloaded`, `networkidle`, `fixed`) before completing the wait loop.
2026-03-18 15:08:51 +08:00
Adrià Arrufat
b698e2d078 LogFilter: init with slice and silence tests 2026-03-17 13:42:35 +09:00
Adrià Arrufat
3d6d669a50 testing: add LogFilter utility for scoped log suppression 2026-03-12 13:56:53 +09:00
Karl Seguin
94ce5edd20 Frames on the same origin share v8 data
Depends on: https://github.com/lightpanda-io/zig-v8-fork/pull/153

In some ways this is an extension of
https://github.com/lightpanda-io/browser/pull/1635 but it has more implications
with respect to correctness.

A js.Context wraps a v8::Context. One of the important thing it adds is the
identity_map so that, given a Zig instance we always return the same v8::Object.

But imagine code running in a frame. This frame has its own Context, and thus
its own identity_map. What happens when that frame does:

```js
window.top.frame_loaded = true;
```

From Zig's point of view, `Window.getTop` will return the correct Zig instance.
It will return the *Window references by the "root" page. When that instance is
passed to the bridge, we'll look for the v8::Object in the Context's
`identity_map` but wont' find it. The mapping exists in the root context
`identity_map`, but not within this frame. So we create a new v8::Object and now
our 1 zig instance has N v8::Objects for every page/frame that tries to access
it.

This breaks cross-frame scripting which should work, at least to some degree,
even when frames are on the same origin.

This commit adds a `js.Origin` which contains the `identity_map`, along with our
other `v8::Global` storage. The `Env` now contains a `*js.Origin` lookup,
mapping an origin string (e.g. lightpanda.io:443) to an *Origin. When a Page's
URL is changed, we call `self.js.setOrigin(new_url)` which will then either get
or create an origin from the Env's origin lookup map.

js.Origin is reference counted so that it remains valid so long as at least 1
frame references them.

There's some special handling for null-origins (i.e. about:blank). At the root,
null origins get a distinct/isolated Origin. For a frame, the parent's origin
is used.

Above, we talked about `identity_map`, but a `js.Context` has 8 other fields
to track v8 values, e.g. `global_objects`, `global_functions`,
`global_values_temp`, etc. These all must be shared by frames on the same
origin. So all of these have also been moved to js.Origin. They've also been
merged so that we now have 3 fields: `identity_map`, `globals` and `temps`.

Finally, when the origin of a context is changed, we set the v8::Context's
SecurityToken (to that origin). This is a key part of how v8 allows cross-
context access.
2026-03-11 08:43:40 +08:00
Nikolay Govorov
687f577562 Move accept loop to common runtime 2026-03-10 03:00:50 +00:00
Nikolay Govorov
8e59ce9e9f Prepare global NetworkRuntime module 2026-03-10 03:00:47 +00:00
Karl Seguin
82e3f126ff Protect against transfer.abort() being called during callback
This was already handled in most cases, but not for a body-less response. It's
safe to call transfer.abort() during a callback, so long as the performing flag
is set to true. This was set during the normal libcurl callbacks, but for a
body-less response, we manually invoke the header_done_callback and were not
setting the performing flag.
2026-03-02 11:44:42 +08:00
Karl Seguin
081979be3b Initial support for frames
Missing:

- [ ] Navigation support within frames (in fact, as-is, any navigation done
      inside a frame, will almost certainly break things
- [ ] Correct CDP support. I don't know how frames are supposed to be exposed
      to CDP. Normal navigate events? Distinct CDP frame_ids?
- [ ] Cross-origin restrictions. The interaction between frames is supposed to
      change depending on whether or not they're on the same origin
- [ ] Potentially handling src-less frames incorrectly. Might not really matter

Adds basic frame support. Initially explored adding a BrowsingContext and
embedding it in Page, with the goal of also having it embedded in a to-be
created Frame. But it turns out that 98% of Page _was_ BrowsingContext and
introducing a BrowsingContext as the primary interaction unit broke pretty much
_every_ single WebAPI. So Page was expanded:

- Added `_parent: ?*Page`, which is `null` for "root" page.
- Added `frame: ?*IFrame`, which is `null` for the "root" page. This is the
  HTMLIFrameElement for frame-pages.
- Added a _type: enum{root, frame}, which is currently only used to improve
  the logs
- Added a frames: std.ArrayList(*Page). This is a list of frames for the page.
  Note that a "frame-page" can itself haven nested frames.

Besides the above, there were 3 "big" changes.

1 - Adding frames (dynamically, parsed) has to create a new page, start
    navigation, track it (in the frames list). Part of this was just
    piggybacking off of code that handles <script>

2 - The page "load" event blocks on the frame "load" event. This cascades.
    when a page triggers it's load, it can do:
```zig
      if (self._parent) |p| {
        p.iframeLoaded(self);
      }
```
   Pages need to keep track of how many iframes they're waiting to load. When
   all iframes (and all scripts) are loaded, it can then triggers its own load
   event.

3 - Our JS execution expects 1 primary entered context (the pages). But we now
    have multiple page contexts, and we need to be in the correct one based
    on where javascript is being executed. There is no more an default entered
    context. Creating a Local.Scope enters the context, and ls.deinit() exits
    the context.
2026-02-19 23:47:33 +08:00
Nikolay Govorov
9296c10ca4 Use per-cdp connection HttpClient 2026-02-18 09:22:26 +00:00
Nikolay Govorov
6ccd3f277b Fix race condition 2026-02-04 13:48:07 +00:00
Nikolay Govorov
a72782f91e Eliminates duplication in the creation of HTTP headers 2026-02-04 09:08:57 +00:00
Nikolay Govorov
f71aa1cad2 Centralizes configuration, eliminates unnecessary copying of config 2026-02-04 07:57:59 +00:00
Nikolay Govorov
fd8c488dbd Move Notification from App to BrowserContext 2026-02-04 07:33:45 +00:00
Karl Seguin
176d42f625 add 'arraybuffer' responseType to XHR 2026-02-02 07:45:21 +08:00
Karl Seguin
181f265de5 Rework Inspector usage
V8's inspector world is made up of 4 components: Inspector, Client, Channel and
Session. Currently, we treat all 4 components as a single unit which is tied to
the lifetime of CDP BrowserContext - or, loosely speaking, 1 "Inspector Unit"
per page / v8::Context.

According to https://web.archive.org/web/20210622022956/https://hyperandroid.com/2020/02/12/v8-inspector-from-an-embedder-standpoint/
and conversation with Gemini, it's more typical to have 1 inspector per isolate.
The general breakdown is the Inspector is the top-level manager, the Client is
our implementation which control how the Inspector works (its function we expose
that v8 calls into). These should be tied to the Isolate. Channels and Sessions
are more closely tied to Context, where the Channel is v8->zig and the Session
us zig->v8.

This PR does a few things
1 - It creates 1 Inspector and Client per Isolate (Env.js)
2 - It creates 1 Session/Channel per BrowserContext
3 - It merges v8::Session and v8::Channel into Inspector.Session
4 - It moves the Inspector instance directly into the Env
5 - BrowserContext interacts with the Inspector.Session, not the Inspector

4 is arguably unnecessary with respect to the main goal of this commit, but
the end-goal is to tighten the integration. Specifically, rather than CDP having
to inform the inspector that a context was created/destroyed, the Env which
manages Contexts directly (https://github.com/lightpanda-io/browser/pull/1432)
and which now has direct access to the Inspector, is now equipped to keep this
in sync.
2026-01-30 15:59:33 +08:00
Karl Seguin
54c45a0cfd Make js.Bridge aware of string.String for input parameters
Avoids having to allocate small strings when going from v8 -> Zig. Also
added a discriminatory type, string.Global which uses the arena, rather than
the call_arena, if an allocation _is_ necessary. (This is similar to a feature
we had before, but was lost in zigdom). Strings from v8 that need to be
persisted, can be allocated directly v8 -> arena, rather than v8 -> call_arena
-> arena.

I think there are a lot of places where we should use string.String - where
strings are expected to be short (e.g. attribute names). But started with just
document.querySelector and querySelectorAll.
2026-01-26 07:52:27 +08:00
Karl Seguin
62aa564df1 Remove Global v8::Local<V8::Context>
When we create a js.Context, we create the underlying v8.Context and store it
for the duration of the page lifetime. This works because we have a global
HandleScope - the v8.Context (which is really a v8::Local<v8::Context>) is that
to the global HandleScope, effectively making it a global.

If we want to remove our global HandleScope, then we can no longer pin the
v8.Context in our js.Context. Our js.Context now only holds a v8.Global of the
v8.Context (v8::Global<v8::Context).

This PR introduces a new type, js.Local, which takes over a lot of the
functionality previously found in either js.Caller or js.Context. The simplest
way to think about it is:

1 - For v8 -> zig calls, we create a js.Caller (as always)
2 - For zig -> v8 calls, we go through the js.Context (as always)
3 - The shared functionality, which works on a v8.Context, now belongs to js.Local

For #1 (v8 -> zig), creating a js.Local for a js.Caller is really simple and
centralized. v8 largely gives us everything we need from the
FunctionCallbackInfo or PropertyCallbackInfo.  For #2, it's messier, because we
can only create a local v8::Context if we have a HandleScope, which we may or
may not.

Unfortunately, in many cases, what to do becomes the responsibility of the caller
and much of the code has to become aware of this local-ness. What does it means
for our code? The impact is on WebAPIs that store .Global. Because the global
can't do anything. You always need to convert that .Global to a local
(e.g. js.Function.Global -> js.Function).

If you're 100% sure the WebAPI is only being invoked by a v8 callback, you can
use `page.js.local.?.toLocal(some_global).call(...)` to get the local value.

If you're 100% sure the WebAPI is only being invoked by Zig, you need to create
 `js.Local.Scope` to get access to a local:

```zig
var ls: js.Local.Scope = undefined;
page.js.localScope(&ls);
defer ls.deinit();
ls.toLocal(some_global).call(...)
// can also access `&ls.local` for APIs that require a *const js.Local
```
For functions that can be invoked by either V8 or Zig, you should generally push
the responsibility to the caller by accepting a `local: *const js.Local`. If the
caller is a v8 callback, it can pass `page.js.local.?`. If the caller is a Zig
callback, it can create a `Local.Scope`.

As an alternative, it is possible to simply pass the *Page, and check
`if page.js.local == null` and, if so, create a Local.Scope. But this should only
be done for performance reasons. We currently only do this in 1 place, and it's
because the Zig caller doesn't know whether a Local will actually be needed and
it's potentially called on every element creating from the parser.
2026-01-19 07:28:33 +08:00
Karl Seguin
8e14dacc32 Improve ergonomics of try catch (and Function's tryCall)
It now returns a Caught struct which contains all information. The Caught struct
can be logged directly, providing more consistent logs for caught errors.
2026-01-14 17:34:02 +08:00
Pierre Tachoire
adfcf7bb2c tests: re-enable metrics JSON output
METRICS=true zig build test
2026-01-08 13:07:27 +01:00
Karl Seguin
7b0e256408 copy history test from legacy 2026-01-05 10:12:41 +08:00
Karl Seguin
087086c308 remove some unused imports 2025-12-26 12:40:20 +08:00
Pierre Tachoire
8f2921f61f add test for big json number with fetch/xhr 2025-12-25 12:50:44 +01:00
Karl Seguin
d9c53a3def Page.scheduleNavigation for location changes 2025-12-22 12:19:08 +08:00
Karl Seguin
1639ff1b98 improve XMLHTTPRequest. Legacy xhr.html pass 2025-12-15 17:56:23 +08:00
Muki Kiboigo
ac85341cab add NavigationKind to navigate 2025-12-09 17:10:59 -08:00
Karl Seguin
0e1b966dce re-enable CDP dom domain 2025-12-09 13:04:01 +08:00
Karl Seguin
9132bc2375 re-enable CDP node registry 2025-12-09 11:50:33 +08:00
Karl Seguin
1164da5e7a copyright notices 2025-11-14 10:52:43 +08:00