browser

mirror of https://github.com/lightpanda-io/browser.git synced 2026-06-11 09:35:59 -04:00

Author	SHA1	Message	Date
Karl Seguin	3ff503bb8b	Add "networkalmostidle" to --wait-until parameter Just config hookup, as the `Frame._notified_network_almost_idle` already exists (and is used by CDP).	2026-06-03 11:50:48 +08:00
Karl Seguin	37a846d91d	remove unused imports	2026-06-01 22:42:39 +08:00
Pierre Tachoire	af6175cc56	rewrite help in a more compact way Inspired by go help output	2026-05-20 16:05:17 +02:00
Francis Bouvier	dffa961f45	Remove options from main help	2026-05-20 16:05:13 +02:00
Pierre Tachoire	2ad2c9d878	Merge pull request #2487 from navidemad/feat/external-stylesheets-flag Add --enable-external-stylesheets flag with fetch + parse	2026-05-20 13:41:59 +02:00
Karl Seguin	b6fd09c5ab	Merge pull request #2502 from lightpanda-io/max-cdp-conn by using httpClient, fetch generates a call to Config.maxConnections	2026-05-20 16:28:56 +08:00
Pierre Tachoire	6eb25d5c44	by using httpClient, fetch generates a call to Config.maxConnections	2026-05-20 10:14:17 +02:00
Navid EMAD	6ed41ea346	Add --enable-external-stylesheets flag (no-op surface) Reserves the CLI flag and LP.configureLoading externalStylesheets field so drivers can adopt the API before the fetch implementation lands in a follow-up that depends on #2303. The bool is intentionally unread in this PR. Mirrors the existing --disable-subframes / --disable-workers plumbing; the CDP field extends LP.configureLoading alongside subFrame and worker without breaking existing callers. Refs #2343	2026-05-19 15:50:11 +02:00
Marc Helbling	a89a28a4a2	feat: add --json to fetch command The `fetch` command is very practical to render pages without needing to have a long running browser instance. It is however masking all details on the fetch, most importantly the HTTP status code. This is a big caveat when leveraging `lightpanda fetch` in a pipeline. This introduces a `--json` option to provide a structured output that contains: * url * HTTP status code * response headers * rendered content as controlled by the `--dump` option The proposal is to always output the same JSON format even when not using `--dump` with an option.	2026-05-19 12:08:23 +02:00
Pierre Tachoire	23a3d5476b	Merge pull request #2458 from lightpanda-io/nikneym/cli-help-rework `help`: rework `help` command	2026-05-18 11:54:29 +02:00
Halil Durak	b2d8c2b834	`help.zon`: introduce `help.zon` Separates `help` explanation from configuration.	2026-05-15 18:31:09 +03:00
Halil Durak	f361f12316	`cli.zig`: change the way `help` command and sub-command detected `cli.zig` is now aware of `help` command at all situations and creates it by itself. Instead of using errors, it initializes `Command` union where `help` branch is active.	2026-05-15 18:31:09 +03:00
Halil Durak	9c40cd9fb2	send `Accept` header when navigating	2026-05-15 18:30:18 +03:00
Scott Taylor	b2998470c2	Add --disable-workers + LP.configureLoading worker opt-out Adds two ways to opt out of dedicated Web Worker loading entirely. The Worker constructor still returns a Worker object so calling pages don't throw, but no script fetch is initiated and the worker scope's eval never runs (postMessage from the page is queued indefinitely with no handler to drain it). * CDP method LP.configureLoading { worker: bool } -- per-session toggleable at runtime, alongside the existing { subFrame: bool }. Both fields are now optional so callers can flip one without resetting the other to its default. Backwards-compatible. * CLI flag --disable-workers -- process-wide default applying to every session and to the fetch subcommand. Operators can flip it on without any driver changes. Mirrors --disable-subframes (#2401) exactly. ## Motivation Reliably-reproducible SIGABRT in Worker.loadInitialScript whenever a page constructs a Web Worker AND lightpanda is launched with --http_proxy. Crash signature: $msg="V8 fatal callback" location=v8::Context::Exit() message="Cannot exit non-entered context" Stack: _browser.webapi.Worker.loadInitialScript _browser.webapi.Worker.httpDoneCallback _network.layer.InterceptionLayer.InterceptContext.doneCallback _browser.HttpClient.processMessages _browser.HttpClient.perform _browser.HttpClient.tick The Zig-side Enter/Exit pair around the worker's eval doesn't match v8's entered_contexts stack invariant under that timing -- something upstream of the loadInitialScript Exit leaves an extra Enter on the stack, so v8's Utils::ApiCheck (`isolate->context() == env`) fires and the process aborts. Reproducible against any Shopify storefront PDP (e.g. https://weareallbirds.myshopify.com/products/mens-wool-runners) when served through any HTTP proxy -- the proxy just adds enough latency to surface the race; the same code path runs without --http_proxy but the timing window is too tight to reliably hit. The Allbirds trigger script is the Shopify web-pixel-extension worker, but ANY Worker the page constructs hits the same code path. The proper fix needs the v8 entered-contexts invariant to be restored end-to-end through the worker eval. That's a deeper dig into how Worker.loadInitialScript / WorkerGlobalScope.importScript / ls.local.runMacrotasks compose with v8's microtask queues across multiple contexts; I tried three intermediate fixes (deferring loadInitialScript via the frame scheduler when other scripts are mid-eval, replacing the post-eval cross-context runMacrotasks with worker-only PerformCheckpoint, and removing runMacrotasks entirely) and none stopped the crash. The bug is fired from inside the synchronous tick path before the post-eval microtask handling runs, which means the leak happens during Script::Run itself and needs more targeted investigation. This PR is the workaround so users hitting the SIGABRT on storefront / analytics-heavy pages have a clean opt-in escape today. For our use case (product catalog extraction) Workers carry no extraction signal -- web-pixel sandboxes, analytics SDKs, marketing tag pixels, etc. -- so disabling them removes a fragile code path without any downside. ## Implementation `Session.worker_loading_enabled: bool = true` -- default matches existing behavior. `Worker.init` short-circuits AFTER constructing the Worker / WorkerGlobalScope / arena bookkeeping (so the JS `new Worker(url)` expression doesn't throw): if (!session.worker_loading_enabled) { log.debug(.browser, "worker disabled", .{ .url = resolved_url }); return self; } Two ways to flip the flag, mirroring the --disable-subframes pattern: 1. LP.configureLoading { worker: bool } -- both subFrame and worker are now optional fields in the params struct, so existing callers passing only { subFrame } continue to work unchanged. 2. --disable-workers CLI flag -- added to CommonOptions (so it applies to serve, fetch, mcp). New Config.disableWorkers() getter; Session.init reads it as the initial value. Total diff: +88 / -3 across 4 files (src/Config.zig, src/browser/Session.zig, src/browser/webapi/Worker.zig, src/cdp/domains/lp.zig). ## Verification Reproducer pattern (puppeteer-core 24.42.0 + tiny CONNECT-tunnel proxy on 127.0.0.1:9999, scripts in cdp-repros/): serve --host 127.0.0.1 --port 9222 --http_proxy http://127.0.0.1:9999 serve --host 127.0.0.1 --port 9222 --http_proxy http://127.0.0.1:9999 --disable-workers Driving https://weareallbirds.myshopify.com/products/mens-wool-runners: baseline (no --disable-workers): 5/5 SIGABRT in Worker.loadInitialScript with the v8 fatal callback above. with --disable-workers: 10/10 successful, returns full HTML (~1MB), no crash. Test suite: make test -> 637 of 637 tests passed (was 636/636 + new cdp.lp: configureLoading toggles subFrame and worker independently regression test). zig fmt --check ./.zig ./*/.zig -> clean. ## Notes * The CDP method is the same domain (LP.configureLoading) and same shape as --disable-subframes' driver-side opt-in, so existing Playwright / puppeteer integrations that already toggle subframes don't need a separate code path -- one CDP call can flip both. * worker_loading_enabled = false does NOT remove Worker from the global namespace (so feature-detection like `if (typeof Worker !== 'undefined')` still reports true). It just makes constructed workers no-op. Pages that postMessage to a worker and wait for a response will hang on that promise forever (or until the page is torn down). For our extraction use case that's fine -- we control the worklist timeout anyway -- but it's worth noting if upstream wants to surface the disabled state more strongly (e.g. throw from postMessage, or remove the global entirely behind an even-stricter flag). * Once the underlying v8 entered-contexts invariant is restored in Worker.loadInitialScript, this flag becomes a perf / sandboxing tool rather than a correctness workaround. Worth keeping anyway: blocking analytics / pixel workers is a reasonable thing to want. ## Related * #2400 -- the iframe analog to this issue (subframe nav invalidates executionContextId); same workaround pattern. * #2401 -- introduced --disable-subframes / LP.configureLoading { subFrame } that this PR mirrors exactly for workers.	2026-05-12 23:46:45 -04:00
Karl Seguin	e9b2aa4946	Merge pull request #2426 from lightpanda-io/feat/cdp-disable-iframes Feat/cdp disable iframes	2026-05-12 13:16:16 +08:00
Karl Seguin	cdfabf7953	Minor tweaks Remove repeating description/comment of flag. Change CDP lp command to be a general configuration endpoint.	2026-05-12 12:56:33 +08:00
Francis Bouvier	31ef5246bc	Add per-subcommand help via `help` or `--help` argument `lightpanda <subcommand> help`, `lightpanda <subcommand> --help` now print only the relevant subcommand options plus common options, instead of the full text. `lightpanda help <subcommand>` is also supported (and that's what use internally).	2026-05-11 22:16:52 +02:00
Halil Durak	19401dc950	`Config`: update `--inject-script` documentation	2026-05-11 15:15:36 +03:00
Halil Durak	60d721caa2	`Config`: increase read file size for `--inject-script-file`	2026-05-11 15:15:36 +03:00
Halil Durak	246f91d1f8	`--inject-script`: prefer splice bytes into `<head>` directly	2026-05-11 15:15:35 +03:00
Halil Durak	c566d0c41c	introduce `--inject-script` and `--inject-script-file` * Prefer `--inject-` prefix. Support injecting multiple scripts (also allows using both variants together). * Instead of executing scripts in JS context, actually insert them to `<head>` for correct dump output.	2026-05-11 15:15:35 +03:00
Halil Durak	39f12a5669	`fetch`: add support for `--script` option Allows passing a JS file as an arg to be executed alongside other scripts.	2026-05-11 15:15:35 +03:00
Scott Taylor	b272b0e33c	Add --disable-subframes CLI flag Complementary to LP.setSubframeLoading (preceding commit): exposes the same iframe-skip behavior as a CLI option that applies to all sessions in the process. Useful for: * the 'fetch' subcommand (no CDP driver to call LP.setSubframeLoading) * 'serve' deployments where the operator wants iframes off by default for every connecting client (the LP method can still re-enable per-session if needed) * Playwright's chromium.connectOverCDP, which can't reliably issue custom CDP methods on Lightpanda today: BrowserContext.newCDPSession and Browser.newBrowserCDPSession both attach a new CRSession that collides with the STARTUP-session reuse from #2399, triggering a Playwright internal assertion. With --disable-subframes set on the server, Playwright doesn't need to issue any custom CDP \u2014 every session inherits subframes-off and the executionContextId churn from #2400 never trips. Verified: serve --disable-subframes + plain puppeteer-core goto [ok] goto status=200 elapsed=6354ms frameAttached=0 fetch --disable-subframes --dump html https://www.allbirds.com/... exit=0 html bytes: 1021562 title: <title>Allbirds Wool Runners, Men's \| ...</title> iframe count in dumped html: 2 (still in DOM, just not loaded) 521/521 unit tests pass.	2026-05-08 17:12:54 -04:00
Halil Durak	c5b16cb18e	`Config`: add a custom validator for `--log-level`	2026-04-28 12:37:16 +03:00
Karl Seguin	ae8013f967	Merge pull request #2275 from lightpanda-io/nikneym/cli-variants `cli`: introduce `variants` + fix `--wait-script-file` regression	2026-04-28 07:32:35 +08:00
Karl Seguin	f515233b52	Merge pull request #2248 from lightpanda-io/blackhole_storage Add and default to Blackhole storage	2026-04-27 21:48:54 +08:00
Halil Durak	968cf5f9e5	`Config`: `log-filter-scopes` -> `--log-filter-scopes`	2026-04-27 16:06:29 +03:00
Halil Durak	f4b220fb5f	`Config`: fix `--wait-script-file` regression	2026-04-27 16:06:12 +03:00
Karl Seguin	7819ee50fa	Add and default to Blackhole storage Tweak sqlite build to omit more features, hoping to shrink the size when sqlite is used.	2026-04-26 09:04:33 +08:00
Karl Seguin	12c2efb811	Adds --terminate-ms command line argument + ctrl-c improvements in fetch The main.zig path for `fetch` now captures the *Browser so that browser.env.terminate() can be called. This is a bit more complex than the serve path because the Browser owns the Isolate and can't be moved from one thread to another. With main having access to the browser, two things are now possible: 1 - We can support a --terminate-ms flag (https://github.com/lightpanda-io/browser/issues/2206) 2 - ctrl-c can correctly stop blocked JavaScript processes 1 is implemented via setitimer to set a timer for SIGALRM, avoiding the need to add another "watcher" thread, or putting a timer in Network.run.	2026-04-25 12:34:06 +08:00
Nikolay Govorov	999f57b729	Remove timeout flag	2026-04-24 12:40:25 +01:00
Nikolay Govorov	c7d004fefb	Setup timeout via tcp keepalive	2026-04-24 12:40:21 +01:00
Nikolay Govorov	c964604c7a	Fix canada.ca problem	2026-04-23 12:15:57 +01:00
Karl Seguin	859a41ab4e	Adds navigator.userAgentData This API isn't supported by FireFox (yet), so it isn't a huge priority, but I did notice that its used across many Google properties. It uses the same value as Sec-Ch-Ua (https://github.com/lightpanda-io/browser/pull/2100) to provide consistent data. Smaller changes: 1 - Allow `OffscreenCanvas` to be used with Worker (noticed this error too) 2 - Don't like JS execution errors at "error" level for "load" and "DOMContentLoaded". "error" should be reserved for things we can fix and should never be triggered from invalid a bug in JavaScript.	2026-04-23 16:24:42 +08:00
Halil Durak	7035317e3e	`Config`: rebase to main	2026-04-22 16:14:28 +03:00
Halil Durak	f389e1945f	rebase to main	2026-04-22 16:08:14 +03:00
Halil Durak	1da11d8da8	`Config`: revert to `--strip-mode`	2026-04-22 16:08:13 +03:00
Halil Durak	012fe40bb5	`cli`: remove `aliases` and `shortcuts`	2026-04-22 16:08:12 +03:00
Halil Durak	a3af914cda	`Config`: adaptation to new cookie arguments	2026-04-22 16:08:12 +03:00
Halil Durak	29d8e0c9b7	`Config`: bring back `validateUserAgent`	2026-04-22 16:08:12 +03:00
Halil Durak	5e0c046e96	rebase and remove commented out code	2026-04-22 16:08:11 +03:00
Halil Durak	721b959dbf	`Config`: always return a valid mode for `--dump`	2026-04-22 16:07:05 +03:00
Halil Durak	74a518c56f	`cli`: reintroduce command sniffing	2026-04-22 16:07:04 +03:00
Halil Durak	f5b9bdb51b	rebase and backport new feature from main	2026-04-22 16:07:04 +03:00
Halil Durak	85e624356b	`Commands`: more aliases	2026-04-22 16:07:03 +03:00
Halil Durak	886448aaa3	`Commands`: add shortcuts for `host` and `port` of `serve`	2026-04-22 16:07:02 +03:00
Halil Durak	10914d6288	`cli`: fix `--log-filter-scopes` regression	2026-04-22 16:07:02 +03:00
Halil Durak	56b6fbe011	`cli`: many improvements * Options with optional types are introduced to null by default * Options with boolean types cannot be optional (nullable) * Options with boolean types are introduced with false by default * introduce shortcuts for options that can be provided via single dash * introduce validators; custom parsing logic can be inserted for niche cases	2026-04-22 16:07:01 +03:00
Halil Durak	81b89e67b7	`Config`: adapt to new CLI builder	2026-04-22 16:07:01 +03:00
Karl Seguin	6d0003ad2b	Sqlite This adds an app.storage which is a union around configurable storage engine (currently, only sqlite). It is _not_ being used anywhere right now. The goal is to get feedback on the implementation and then move cache to it. This doesn't expose a generic query API. The goal is that the storage will expose high level methods, e.g. `cacheGet(req: CacheGetRequest)` and every storage engine will translate the `storage.CacheGetRequest` as needed. A thin wrapper around the Sqlite C api is included, e.g. exec(SQL, .{args}) a `rows` and `row` fetcher. A connection pool is included. By default, an in-memory DB is currently created. And a `migrations` table with an id of `1` is created/inserted. I don't imagine needing fancy migratations.	2026-04-20 17:13:06 +08:00

1 2 3

115 Commits