This fixes two things. First, it better tracks the calling context and the
target context, so that if a parent frame does a document.write within a child
frame, any JavaScript is executed in the child frame's context (previously, it
would be executed in the parent's context).
Second, it ensures that, on document.write, we force-execute any pending
scripts. This is related to https://github.com/lightpanda-io/browser/pull/2542
but applies specifically to document.write.
While both Window and WorkerGlobalState expose the same "Performance" object,
some getters are window specific (e.g. eventCounts). I thought the same object
would always have the same shape and, when not possible, they'd be different
types (e.g. Location vs WorkerLocation). But that isn't always the case. So
.exposed = .window (or .worker, defaulting to .both) can not be defined on the
bridge mapping.
If a worker is in some heavy JS e.g. `for(;;)` it will be stuck forever, even
if the peer closes the CDP connection.
With CDP reads now owned by the network thread, we now correctly detect the
disconnect and simply need to force the worker to shutdown. To achieve this,
on socket close, the CdpLink held by the network is given a terminate_ms (five
seconds from now) and added to a linked list. On every wakeup, the network
thread can check the list + timestamp and, if necessary, call Isolate::Terminate
(which is safe to call on a different thread).
syncRequest's `terminated` early return (added in 88b98e70) exits before request() takes ownership of req.headers, so the curl_slist leaked after a latched disconnect (any nested fetch / importScripts / external stylesheet). Free it in the early return, mirroring request()'s own failure paths.
Also adds a behavioral syncRequest-after-disconnect test. The leak itself isn't assertable here (curl_slist is C-allocated through the tracking allocator, outside the per-test leak check), so it's verified by the ownership contract rather than the suite.
Drives a .disconnect through the worker's inbox and asserts a second tick still returns error.ClientDisconnected once the inbox is empty. Fails against the pre-latch HttpClient (the worker would re-park in perform with no producer to wake it); passes with the latch.
This includes EventCounts and PerformanceObserver. This change is like any
other WebAPI on Worker change (e.g. Frame -> js.Execution), but we need the
performance observer hooks in addition to that. Thankfully, every Frame/Worker
only has 1 Performance instance, so we can move the observers and delivery
from the Frame to the Performance instance so that both Frame and WGS gain the
behavior.
syncRequest creates a context that lives on the function's stack. This "works"
because the transfer is only expected to live during the call to syncRequest
(this is the entire premise of syncRequest). But if self.tick returns an error,
the function returns, potentially with transfer.req.ctx still pointing to the
stack value.
This change captures a tick error and, if the sync request is still in_progress,
aborts the transfer. There's maybe a world where the error is recoverable and
the request could continue, but that seems error prone. AND, the most likely
non-transfer error that tick would return are pretty "serious", e.g. OOM or
client disconnected.
Our navigate on about:blank short-circuits most of the complex logic and loads
the content then and there, not asynchronously. This results in notifications
for the frame navigation/creation that happen immediately. With the current
code ordering though, the frame isn't entered in child_frames until AFTER
navigation, resulting in these events trigger on a frame which can't be found
via Session.findFrameByFrameId
This code adds the frame to child_frames BEFORE navigate (and removes it on
error).
Note: about:blank iframe navigation is more common than you probably think. As
soon as a frame is [dynamically] created, it navigates to about:blank, e.g.:
let iframe = document.createElement('iframe');
// IMPLICIT navigation to about:blank happens here
// and this will be another navigation event
iframe.src = "keemun.php";
Currently, if a disconnect/close is captured in a worker during a syncRequest,
that specific request is terminated, but the error doesn't bubble up. The worker
remains alive and will subsequently block in a perform, with no connection alive
to wake it up.
In this commit, when disconnect/close is received, inbox.terminate is set to
true. This flag is checked (in syncRequest and http_client.tick) and
error.ClientDisconnected is returned.
(Also, on network shutdown, always broadcast the cdp_unregister since there's
no harm in sending an extra signal even if nothing was removed).
The CDP server ignored a single SIGTERM while a connection was live: the
process only exited if the socket was closed before the signal, or after a
third signal. A conventional one-shot graceful stop (SIGTERM then waitpid)
hung.
On shutdown the sighandler runs Network.stop (which sets `shutdown` and lets
the run loop exit) before Server.shutdown. A live CDP worker parks in
curl_multi_poll and is woken ONLY by the Network thread via
dropCdp -> handles.wakeup(). Once the run loop exits with links still live,
nothing can wake those workers, so Server.deinit()'s
`while (active_threads > 0)` loop spins forever.
Drop every still-live CDP link from the run loop when `shutdown` is set,
reusing the existing peer-EOF path: dropCdp(notify=true) pushes a .disconnect
into the worker's inbox and wakes it, so cdp.tick() returns false and the
worker exits before the loop breaks.
preparePollFds cleared and rebuilt the curl portion of `pollfds` every
loop iteration, but sliced `pollfds[PSEUDO_POLLFDS..]` — all the way to
the end of the array. That range also covers the CDP socket region
`[cdp_start..]`, which prepareCdpPollFds owns and only rebuilds when
`cdp_dirty` is set (a steady-state optimization). So the @memset wiped
every live CDP socket fd to -1 on each iteration.
This only bites once Network owns a curl multi handle, which is created
solely by telemetry — and telemetry is disabled in Debug builds, which
is why it reproduced only in ReleaseFast/ReleaseSafe (and the nightly).
Regular HTTP/navigation runs on the worker's own handles, not Network's
multi, so it never triggered the path in Debug.
Once the CDP sockets are dropped from the poll set, the Network thread
stops reading client messages (#2508, hard stall after the first
command) and never observes peer EOF or `conn.shutdown`, so the worker
is never told to exit and SIGTERM is ignored after a connection (#2507).
Fix: slice only the curl region `[PSEUDO_POLLFDS..cdp_start]`.
Also harden the poll timeout: `curl_multi_timeout` returns -1 when curl
is idle, and `@min(250, -1)` is -1 (block forever), which starved
onTick (telemetry's periodic flush) and turned any missed wakeup into a
permanent hang rather than a <=250ms blip. Treat curl_timeout <= 0 as
"no deadline" and fall back to the 250ms cap.
Fixes#2507Fixes#2508
Replaces 4 boolean flags with a state. Makes it easier to figure out what the
state of the transfer is, and removes the possibility of inconsistent flags
.e.g queued + loop_owned.
loop_owned -> state != .created
_queued -> state == .queued
_perform -> state == .completing
aborted -> state == .aborted
This largely reverts 92607ad765 (captured in PR:
https://github.com/lightpanda-io/browser/pull/2398).
https://github.com/lightpanda-io/browser/pull/2495 introduces protection against
execution arbitrary CDP command during JavaScript callbacks. Claude initially
made the case for keeping the existing code as a safety net, but sycophanted
when I pushed by.
My reason for removing it is that it isn't a low-maintenance guard. It's a flag
that serves a real purpose (ensuring 1 JS script is finished before executing
another one), that has been expended to solve these issues. It needs to be set
(and reverted) at every callsite that makes a blocking call, and it needs to be
checked (recursively across all frames) in any place that can teardown the page/
frame.
Claude called the allowlist "load-bearing in a non-obvious way", but I think
it's purpose built specifically for this case. Extended the comment atop
`allowDuringSyncWait` so that future-selves remember this.
Removes an assertion that can break with custom element callbacks.
https://github.com/lightpanda-io/browser/pull/2429 does not solve this issue
since it isn't a result of when reactions are executed, but just that they
happen.
(I should note, that I'm not 100% sure the above statement is correct. It's
possible that our CE reactions are (in some cases at least) too micro. Maybe
some operations, like setInnerHtml should operate within a more atomic
frameworks, vs an CE-reaction per internal step. But that's a pretty big change)
This adds help documentation for the --json flag. This is the only thing that
must be kept from this commit.
It uses our testing.zig to streamline the json testing (instead of string
probing). It removes the JsonEnvelope in favor of an anonymous struct (though,
I'm fine with adding it back in if it's needed to resolve something ambiguous).
Finally, I removed the last unit test as, at that point, it's really just
testing Zig's JSON stringifier (could arguably make the same case for the other
two, but there's some logic there about how nulls/empty might be handled).
Trim down a lot of the comments. Inline remarks in some cases rather than a
large function header.
Removing the `errdefer headers.deinit();` is unfortunate but currently
necessary to avoid a potential double-free if the request gets far enough that
http_client frees it and still returns an error. This is a known issue that
needs to be fixed separately and that impacts multiple call-sites. My "fix"
introduces a possible (very small) memory leak versus a possible crash.