Commit Graph

149 Commits

Author SHA1 Message Date
Karl Seguin
9d6dcb71ab Merge pull request #2521 from lightpanda-io/multi_remove_assert_attributes
Add more attributes to multi_remove assertion failure
2026-05-22 20:12:53 +08:00
Pierre Tachoire
8b2a79d93a Merge pull request #2515 from lightpanda-io/cdp_watchdog 2026-05-22 13:19:48 +02:00
Karl Seguin
38a4a334fd Add more attributes to multi_remove assertion failure
Saw this assertion catch for the first time today. Hoping the extra data will
help identity the issue. No URL or other identifiable data is logged.
2026-05-22 16:10:11 +08:00
Karl Seguin
abdfd443e1 On CDP client disconnect, terminate Env
Protects against a stuck worker. This works even if the worker isn't currently
in JS but then enters JS.
2026-05-22 07:13:46 +08:00
Karl Seguin
3fc75dfe85 Revert "Add watchdog to Network thread for abandoned & stuck workers"
This reverts commit 8da7657d4c.
2026-05-22 07:09:57 +08:00
Pierre Tachoire
bc43d64324 Merge pull request #2511 from navidemad/fix-sigterm-live-cdp-connection
network: terminate live CDP connections on shutdown
2026-05-21 19:02:47 +02:00
Navid EMAD
64a6e40121 network: advance shutdownCdpLinks iterator before dropCdp 2026-05-21 15:56:22 +02:00
Karl Seguin
8da7657d4c Add watchdog to Network thread for abandoned & stuck workers
If a worker is in some heavy JS e.g. `for(;;)` it will be stuck forever, even
if the peer closes the CDP connection.

With CDP reads now owned by the network thread, we now correctly detect the
disconnect and simply need to force the worker to shutdown. To achieve this,
on socket close, the CdpLink held by the network is given a terminate_ms (five
seconds from now) and added to a linked list. On every wakeup, the network
thread can check the list + timestamp and, if necessary, call Isolate::Terminate
(which is safe to call on a different thread).
2026-05-21 19:22:12 +08:00
Navid EMAD
3bfc434b3b network: fix typo in shutdown comment (Idempotent) 2026-05-21 12:40:48 +02:00
Karl Seguin
972be65db7 Merge pull request #2505 from lightpanda-io/transfer_state
Cleanup Transfer flag
2026-05-21 17:42:00 +08:00
Karl Seguin
88b98e705f Capture disconnect/close in Worker
Currently, if a disconnect/close is captured in a worker during a syncRequest,
that specific request is terminated, but the error doesn't bubble up. The worker
remains alive and will subsequently block in a perform, with no connection alive
to wake it up.

In this commit, when disconnect/close is received, inbox.terminate is set to
true. This flag is checked (in syncRequest and http_client.tick) and
error.ClientDisconnected is returned.

(Also, on network shutdown, always broadcast the cdp_unregister since there's
no harm in sending an extra signal even if nothing was removed).
2026-05-21 10:38:52 +08:00
Karl Seguin
e4171bc694 Merge pull request #2501 from lightpanda-io/remove_reentrency_teardown_protection
Remove reentrency teardown protection
2026-05-21 10:15:26 +08:00
Navid EMAD
bdf28c51cd network: terminate live CDP connections on shutdown
The CDP server ignored a single SIGTERM while a connection was live: the
process only exited if the socket was closed before the signal, or after a
third signal. A conventional one-shot graceful stop (SIGTERM then waitpid)
hung.

On shutdown the sighandler runs Network.stop (which sets `shutdown` and lets
the run loop exit) before Server.shutdown. A live CDP worker parks in
curl_multi_poll and is woken ONLY by the Network thread via
dropCdp -> handles.wakeup(). Once the run loop exits with links still live,
nothing can wake those workers, so Server.deinit()'s
`while (active_threads > 0)` loop spins forever.

Drop every still-live CDP link from the run loop when `shutdown` is set,
reusing the existing peer-EOF path: dropCdp(notify=true) pushes a .disconnect
into the worker's inbox and wakes it, so cdp.tick() returns false and the
worker exits before the loop breaks.
2026-05-20 21:01:51 +02:00
Navid EMAD
e1e49c8a2e Fix Network dropping CDP sockets from its poll set once a multi exists
preparePollFds cleared and rebuilt the curl portion of `pollfds` every
loop iteration, but sliced `pollfds[PSEUDO_POLLFDS..]` — all the way to
the end of the array. That range also covers the CDP socket region
`[cdp_start..]`, which prepareCdpPollFds owns and only rebuilds when
`cdp_dirty` is set (a steady-state optimization). So the @memset wiped
every live CDP socket fd to -1 on each iteration.

This only bites once Network owns a curl multi handle, which is created
solely by telemetry — and telemetry is disabled in Debug builds, which
is why it reproduced only in ReleaseFast/ReleaseSafe (and the nightly).
Regular HTTP/navigation runs on the worker's own handles, not Network's
multi, so it never triggered the path in Debug.

Once the CDP sockets are dropped from the poll set, the Network thread
stops reading client messages (#2508, hard stall after the first
command) and never observes peer EOF or `conn.shutdown`, so the worker
is never told to exit and SIGTERM is ignored after a connection (#2507).

Fix: slice only the curl region `[PSEUDO_POLLFDS..cdp_start]`.

Also harden the poll timeout: `curl_multi_timeout` returns -1 when curl
is idle, and `@min(250, -1)` is -1 (block forever), which starved
onTick (telemetry's periodic flush) and turned any missed wakeup into a
permanent hang rather than a <=250ms blip. Treat curl_timeout <= 0 as
"no deadline" and fall back to the 250ms cap.

Fixes #2507
Fixes #2508
2026-05-20 19:13:09 +02:00
Karl Seguin
1cdd2bb324 Cleanup Transfer flag
Replaces 4 boolean flags with a state. Makes it easier to figure out what the
state of the transfer is, and removes the possibility of inconsistent flags
.e.g queued + loop_owned.

loop_owned -> state != .created
_queued -> state == .queued
_perform -> state == .completing
aborted -> state == .aborted
2026-05-20 20:37:21 +08:00
Karl Seguin
a9cf87e0b0 Remove reentrency teardown protection
This largely reverts 92607ad765 (captured in PR:
https://github.com/lightpanda-io/browser/pull/2398).

https://github.com/lightpanda-io/browser/pull/2495 introduces protection against
execution arbitrary CDP command during JavaScript callbacks. Claude initially
made the case for keeping the existing code as a safety net, but sycophanted
when I pushed by.

My reason for removing it is that it isn't a low-maintenance guard. It's a flag
that serves a real purpose (ensuring 1 JS script is finished before executing
another one), that has been expended to solve these issues. It needs to be set
(and reverted) at every callsite that makes a blocking call, and it needs to be
checked (recursively across all frames) in any place that can teardown the page/
frame.

Claude called the allowlist "load-bearing in a non-obvious way", but I think
it's purpose built specifically for this case. Extended the comment atop
`allowDuringSyncWait` so that future-selves remember this.
2026-05-20 15:08:18 +08:00
Karl Seguin
97c8ca3832 when work is done, don't keep polling, return to process it 2026-05-19 22:39:48 +08:00
Karl Seguin
875c147783 Main/Network reads CDP socket
Previously, the CDP socket was added to the worker's multi and fully owned
by the worker. While this is simple, it introduced some issues:

1 - Cannot detect a disconnected client during JS processing ( for(;;) )

2 - A blocked worker can cause back-pressure that blocks the client. This can
    cause a deadlock if the worker is blocked waiting for a CDP message

In addition to these 2 problems, there was 1 other serious CDP-related issue:
arbitrary CDP messages could be processed during JavaScript callback. For
example, a Worker calls importScripts while request interception is enabled,
this requires us to tick the HttpClient waiting for the interception response.
But, a client could sent Target.closeTarget, which we'd process and delete the
frame..all while importScripts is still blocked. Assuming importScripts unblocks
everything is a big UAF since the frame (and its workers) were cleared from
closeTarget.

The CDP socket is now read from the network (main) thread and an OTP-style
mailbox is used. The network thread posts message to the Worker's inbox and
signals it to wakeup. This solves #1 and #2. It doesn't directly solve the
reentrancy issue, but it provides the foundation. Specifically, in introduces
a queue for of CDP message and more control over when/how that queue is
processed. At "safe points" (Runner.tick, HttpClient.tick), any message can
be processed. But, when inside a JavaScript callback, we can process only non-
destructive/mutating message. Specifically, we can process only messages related
to request interception.
2026-05-19 20:52:21 +08:00
Karl Seguin
8ef6084fdb Re-organization CDP connection
network/WsConnection.zig was poorly named. It didn't represent a generic WS
connection, but rather a CDP-specific connection. This splits the generic WS
logic into network/WS.zig and the CDP-specific details in cdp/Connection.zig.

Some of the connection management in the Server has also been simplified.
2026-05-19 10:08:22 +08:00
Karl Seguin
a5162bea8f Cleanup HttpClient.Transfer
This is just moving fields around. The end result is that there's a
`transfer.req` and a `transfer.res`.

On the Request side, we use to have a nested `params: RequestParam` resulting
in a lot of `transfer.req.params.url`. This is now `transfer.req.url`. On the
Response side, we had the exact opposite: response fields splattered directly
in the transfer, `transfer.response_header`. This is now `transfer.res.header`.

There is now an HttpClient.Response, which is the actual final response (which
could be for a transfer or something else, e.g the cache). And an
HttpClient.Transfer.Response which captures the inflight response data (and is
one of the polymorphic variants of the HttpClient.Response). Probably still not
ideal, but I'm not sure how to make it cleaner, and even if this is just an
intermediary step, I consider it an small win.
2026-05-15 12:55:47 +08:00
Muki Kiboigo
940976b6a7 properly disable cache on Network.setCacheDisabled 2026-05-14 09:03:51 -07:00
Muki Kiboigo
ac863c7e2b add Network.requestServedFromCache 2026-05-13 21:47:47 -07:00
Muki Kiboigo
4a45b4d866 fix crash on robots.txt request fufilled immediately 2026-05-12 21:50:05 -07:00
Karl Seguin
50b126b402 fix cachelayer hit path 2026-05-13 07:14:44 +08:00
Karl Seguin
5e0976bbd6 fix use-after-free on robotslayer shutdown 2026-05-12 19:26:24 +08:00
Karl Seguin
82a4fc752b HttpClient Improvements
1 - Track owner of a request (for simpler / more accurate abort (TBD))

2 - Create Transfer upfront, make everything work on Transfer (not Request)
    This helps remove ambiguity about cleanup and simplifies layers. For example
    Robots request is just another normal request, not a special case. This gives
    everything a stable address (the *Transfer which can be looked up by id)
2026-05-12 19:26:24 +08:00
Muki Kiboigo
14e1f1bcf6 add clear fn to Cache and FsCache 2026-05-07 09:00:33 -07:00
Nikolay Govorov
9a312a4177 Refactor server/client/cdp structure 2026-05-04 16:41:22 +01:00
Adrià Arrufat
eab9ae0243 RobotsLayer: use managed ArrayList 2026-05-04 08:01:22 +02:00
Patrick Wyatt
47d96ab8ad Display actual port when binding --port 0
This change causes lightpanda to display the actual port number (instead of 0)
when binding a dynamic port (--port 0), which makes automating based on
scraping lightpanda output simple.
2026-04-29 21:44:41 -07:00
Muki Kiboigo
e8c9acd310 fix request arena leak on CacheLayer hit 2026-04-28 09:48:23 -07:00
Muki Kiboigo
1ab445843c better arena management in Robots Layer and Context 2026-04-28 07:01:43 -07:00
Muki Kiboigo
1370f6805b add a note about cdp callback cb 2026-04-28 07:01:42 -07:00
Muki Kiboigo
3fe774fbfb pass error all the way up to Layer chain to clean 2026-04-28 07:01:42 -07:00
Muki Kiboigo
4de1dc5424 properly call error callback in InterceptionLayer 2026-04-28 07:01:42 -07:00
Muki Kiboigo
83b047e66a assert that intercepted isn't 0 before decrementing 2026-04-28 07:01:42 -07:00
Muki Kiboigo
c719a522b8 use lightpanda module log in layers 2026-04-28 07:01:42 -07:00
Muki Kiboigo
152a792c18 use Request Arena in RobotsLayer 2026-04-28 07:01:41 -07:00
Muki Kiboigo
e56036fb50 use Request Arena in CacheLayer 2026-04-28 07:01:41 -07:00
Muki Kiboigo
fc702794c2 use Request Arena in WebBotAuthLayer 2026-04-28 07:01:41 -07:00
Muki Kiboigo
d14b75d93b use Request arnea in InterceptionLayer 2026-04-28 07:01:41 -07:00
Muki Kiboigo
bb9e238f6c Requests now use arenas from the arena pool 2026-04-28 07:01:41 -07:00
Muki Kiboigo
175c2cc288 ensure robots params have arena and request id 2026-04-28 07:01:41 -07:00
Muki Kiboigo
87eec578aa use arena pool in InterceptionLayer 2026-04-28 07:01:41 -07:00
Muki Kiboigo
ca08f0c56d remove blocking from RequestParams 2026-04-28 07:01:40 -07:00
Muki Kiboigo
3db3281e8e working authentication with InterceptionLayer 2026-04-28 07:01:40 -07:00
Muki Kiboigo
d0b421b085 partial auth challenge support 2026-04-28 07:01:40 -07:00
Muki Kiboigo
dddd0dfb90 fix request id mismatch on cdp 2026-04-28 07:01:40 -07:00
Muki Kiboigo
0d50f706db more fixing of hanging in cdp interception 2026-04-28 07:01:40 -07:00
Muki Kiboigo
9c826159a0 crude InterceptionLayer 2026-04-28 07:01:40 -07:00