Reserves the CLI flag and LP.configureLoading externalStylesheets field
so drivers can adopt the API before the fetch implementation lands in a
follow-up that depends on #2303.
The bool is intentionally unread in this PR. Mirrors the existing
--disable-subframes / --disable-workers plumbing; the CDP field extends
LP.configureLoading alongside subFrame and worker without breaking
existing callers.
Refs #2343
The `fetch` command is very practical to render pages without needing to
have a long running browser instance.
It is however masking all details on the fetch, most importantly the HTTP status code.
This is a big caveat when leveraging `lightpanda fetch` in a pipeline.
This introduces a `--json` option to provide a structured output that
contains:
* url
* HTTP status code
* response headers
* rendered content as controlled by the `--dump` option
The proposal is to always output the same JSON format even when not
using `--dump` with an option.
`cli.zig` is now aware of `help` command at all situations and creates it by itself. Instead of using errors, it initializes `Command` union where `help` branch is active.
Adds two ways to opt out of dedicated Web Worker loading entirely. The
Worker constructor still returns a Worker object so calling pages don't
throw, but no script fetch is initiated and the worker scope's eval
never runs (postMessage from the page is queued indefinitely with no
handler to drain it).
* CDP method LP.configureLoading { worker: bool } -- per-session
toggleable at runtime, alongside the existing { subFrame: bool }.
Both fields are now optional so callers can flip one without
resetting the other to its default. Backwards-compatible.
* CLI flag --disable-workers -- process-wide default applying to every
session and to the fetch subcommand. Operators can flip it on without
any driver changes. Mirrors --disable-subframes (#2401) exactly.
## Motivation
Reliably-reproducible SIGABRT in Worker.loadInitialScript whenever a
page constructs a Web Worker AND lightpanda is launched with
--http_proxy. Crash signature:
$msg="V8 fatal callback" location=v8::Context::Exit()
message="Cannot exit non-entered context"
Stack:
_browser.webapi.Worker.loadInitialScript
_browser.webapi.Worker.httpDoneCallback
_network.layer.InterceptionLayer.InterceptContext.doneCallback
_browser.HttpClient.processMessages
_browser.HttpClient.perform
_browser.HttpClient.tick
The Zig-side Enter/Exit pair around the worker's eval doesn't match
v8's entered_contexts stack invariant under that timing -- something
upstream of the loadInitialScript Exit leaves an extra Enter on the
stack, so v8's Utils::ApiCheck (`isolate->context() == *env`) fires
and the process aborts.
Reproducible against any Shopify storefront PDP (e.g.
https://weareallbirds.myshopify.com/products/mens-wool-runners) when
served through any HTTP proxy -- the proxy just adds enough latency
to surface the race; the same code path runs without --http_proxy
but the timing window is too tight to reliably hit. The Allbirds
trigger script is the Shopify web-pixel-extension worker, but ANY
Worker the page constructs hits the same code path.
The proper fix needs the v8 entered-contexts invariant to be
restored end-to-end through the worker eval. That's a deeper dig
into how Worker.loadInitialScript / WorkerGlobalScope.importScript /
ls.local.runMacrotasks compose with v8's microtask queues across
multiple contexts; I tried three intermediate fixes (deferring
loadInitialScript via the frame scheduler when other scripts are
mid-eval, replacing the post-eval cross-context runMacrotasks with
worker-only PerformCheckpoint, and removing runMacrotasks entirely)
and none stopped the crash. The bug is fired from inside the
synchronous tick path before the post-eval microtask handling
runs, which means the leak happens during Script::Run itself and
needs more targeted investigation.
This PR is the workaround so users hitting the SIGABRT on
storefront / analytics-heavy pages have a clean opt-in escape today.
For our use case (product catalog extraction) Workers carry no
extraction signal -- web-pixel sandboxes, analytics SDKs, marketing
tag pixels, etc. -- so disabling them removes a fragile code path
without any downside.
## Implementation
`Session.worker_loading_enabled: bool = true` -- default matches
existing behavior.
`Worker.init` short-circuits AFTER constructing the Worker /
WorkerGlobalScope / arena bookkeeping (so the JS `new Worker(url)`
expression doesn't throw):
if (!session.worker_loading_enabled) {
log.debug(.browser, "worker disabled", .{ .url = resolved_url });
return self;
}
Two ways to flip the flag, mirroring the --disable-subframes pattern:
1. LP.configureLoading { worker: bool } -- both subFrame and worker
are now optional fields in the params struct, so existing callers
passing only { subFrame } continue to work unchanged.
2. --disable-workers CLI flag -- added to CommonOptions (so it
applies to serve, fetch, mcp). New Config.disableWorkers() getter;
Session.init reads it as the initial value.
Total diff: +88 / -3 across 4 files (src/Config.zig,
src/browser/Session.zig, src/browser/webapi/Worker.zig,
src/cdp/domains/lp.zig).
## Verification
Reproducer pattern (puppeteer-core 24.42.0 + tiny CONNECT-tunnel
proxy on 127.0.0.1:9999, scripts in cdp-repros/):
serve --host 127.0.0.1 --port 9222 --http_proxy http://127.0.0.1:9999
serve --host 127.0.0.1 --port 9222 --http_proxy http://127.0.0.1:9999 --disable-workers
Driving https://weareallbirds.myshopify.com/products/mens-wool-runners:
baseline (no --disable-workers): 5/5 SIGABRT in
Worker.loadInitialScript with the v8 fatal callback above.
with --disable-workers: 10/10 successful, returns full
HTML (~1MB), no crash.
Test suite:
make test -> 637 of 637 tests passed (was 636/636 + new
cdp.lp: configureLoading toggles subFrame and worker
independently regression test).
zig fmt --check ./*.zig ./**/*.zig -> clean.
## Notes
* The CDP method is the same domain (LP.configureLoading) and same
shape as --disable-subframes' driver-side opt-in, so existing
Playwright / puppeteer integrations that already toggle
subframes don't need a separate code path -- one CDP call can
flip both.
* worker_loading_enabled = false does NOT remove Worker from the
global namespace (so feature-detection like
`if (typeof Worker !== 'undefined')` still reports true). It just
makes constructed workers no-op. Pages that postMessage to a worker
and wait for a response will hang on that promise forever (or
until the page is torn down). For our extraction use case that's
fine -- we control the worklist timeout anyway -- but it's worth
noting if upstream wants to surface the disabled state more
strongly (e.g. throw from postMessage, or remove the global
entirely behind an even-stricter flag).
* Once the underlying v8 entered-contexts invariant is restored in
Worker.loadInitialScript, this flag becomes a perf / sandboxing
tool rather than a correctness workaround. Worth keeping anyway:
blocking analytics / pixel workers is a reasonable thing to want.
## Related
* #2400 -- the iframe analog to this issue (subframe nav invalidates
executionContextId); same workaround pattern.
* #2401 -- introduced --disable-subframes / LP.configureLoading
{ subFrame } that this PR mirrors exactly for workers.
`lightpanda <subcommand> help`, `lightpanda <subcommand> --help`
now print only the relevant subcommand options plus common options,
instead of the full text.
`lightpanda help <subcommand>` is also supported
(and that's what use internally).
* Prefer `--inject-*` prefix.
* Support injecting multiple scripts (also allows using both variants together).
* Instead of executing scripts in JS context, actually insert them to `<head>` for correct dump output.
Complementary to LP.setSubframeLoading (preceding commit): exposes
the same iframe-skip behavior as a CLI option that applies to all
sessions in the process. Useful for:
* the 'fetch' subcommand (no CDP driver to call LP.setSubframeLoading)
* 'serve' deployments where the operator wants iframes off by
default for every connecting client (the LP method can still
re-enable per-session if needed)
* Playwright's chromium.connectOverCDP, which can't reliably issue
custom CDP methods on Lightpanda today: BrowserContext.newCDPSession
and Browser.newBrowserCDPSession both attach a new CRSession that
collides with the STARTUP-session reuse from #2399, triggering a
Playwright internal assertion. With --disable-subframes set on the
server, Playwright doesn't need to issue any custom CDP \u2014 every
session inherits subframes-off and the executionContextId churn
from #2400 never trips.
Verified:
serve --disable-subframes + plain puppeteer-core goto
[ok] goto status=200 elapsed=6354ms frameAttached=0
fetch --disable-subframes --dump html https://www.allbirds.com/...
exit=0
html bytes: 1021562
title: <title>Allbirds Wool Runners, Men's | ...</title>
iframe count in dumped html: 2 (still in DOM, just not loaded)
521/521 unit tests pass.
The main.zig path for `fetch` now captures the *Browser so that
browser.env.terminate() can be called. This is a bit more complex than the serve
path because the Browser owns the Isolate and can't be moved from one thread to
another.
With main having access to the browser, two things are now possible:
1 - We can support a --terminate-ms flag (https://github.com/lightpanda-io/browser/issues/2206)
2 - ctrl-c can correctly stop blocked JavaScript processes
1 is implemented via setitimer to set a timer for SIGALRM, avoiding the need to
add another "watcher" thread, or putting a timer in Network.run.
This API isn't supported by FireFox (yet), so it isn't a huge priority, but I
did notice that its used across many Google properties. It uses the same value
as Sec-Ch-Ua (https://github.com/lightpanda-io/browser/pull/2100) to provide
consistent data.
Smaller changes:
1 - Allow `OffscreenCanvas` to be used with Worker (noticed this error too)
2 - Don't like JS execution errors at "error" level for "load" and
"DOMContentLoaded". "error" should be reserved for things we can fix and
should never be triggered from invalid a bug in JavaScript.
* Options with optional types are introduced to null by default
* Options with boolean types cannot be optional (nullable)
* Options with boolean types are introduced with false by default
* introduce shortcuts for options that can be provided via single dash
* introduce validators; custom parsing logic can be inserted for niche cases
This adds an app.storage which is a union around configurable storage engine
(currently, only sqlite).
It is _not_ being used anywhere right now. The goal is to get feedback on
the implementation and then move cache to it.
This doesn't expose a generic query API. The goal is that the storage will
expose high level methods, e.g. `cacheGet(req: CacheGetRequest)` and every
storage engine will translate the `storage.CacheGetRequest` as needed.
A thin wrapper around the Sqlite C api is included, e.g. exec(SQL, .{args}) a
`rows` and `row` fetcher. A connection pool is included. By default, an
in-memory DB is currently created. And a `migrations` table with an id of `1`
is created/inserted. I don't imagine needing fancy migratations.