Commit Graph

115 Commits

Author SHA1 Message Date
Karl Seguin
3ff503bb8b Add "networkalmostidle" to --wait-until parameter
Just config hookup, as the `Frame._notified_network_almost_idle` already exists
(and is used by CDP).
2026-06-03 11:50:48 +08:00
Karl Seguin
37a846d91d remove unused imports 2026-06-01 22:42:39 +08:00
Pierre Tachoire
af6175cc56 rewrite help in a more compact way
Inspired by go help output
2026-05-20 16:05:17 +02:00
Francis Bouvier
dffa961f45 Remove options from main help 2026-05-20 16:05:13 +02:00
Pierre Tachoire
2ad2c9d878 Merge pull request #2487 from navidemad/feat/external-stylesheets-flag
Add --enable-external-stylesheets flag with fetch + parse
2026-05-20 13:41:59 +02:00
Karl Seguin
b6fd09c5ab Merge pull request #2502 from lightpanda-io/max-cdp-conn
by using httpClient, fetch generates a call to Config.maxConnections
2026-05-20 16:28:56 +08:00
Pierre Tachoire
6eb25d5c44 by using httpClient, fetch generates a call to Config.maxConnections 2026-05-20 10:14:17 +02:00
Navid EMAD
6ed41ea346 Add --enable-external-stylesheets flag (no-op surface)
Reserves the CLI flag and LP.configureLoading externalStylesheets field
so drivers can adopt the API before the fetch implementation lands in a
follow-up that depends on #2303.

The bool is intentionally unread in this PR. Mirrors the existing
--disable-subframes / --disable-workers plumbing; the CDP field extends
LP.configureLoading alongside subFrame and worker without breaking
existing callers.

Refs #2343
2026-05-19 15:50:11 +02:00
Marc Helbling
a89a28a4a2 feat: add --json to fetch command
The `fetch` command is very practical to render pages without needing to
have a long running browser instance.
It is however masking all details on the fetch, most importantly the HTTP status code.
This is a big caveat when leveraging `lightpanda fetch` in a pipeline.

This introduces a `--json` option to provide a structured output that
contains:
* url
* HTTP status code
* response headers
* rendered content as controlled by the `--dump` option

The proposal is to always output the same JSON format even when not
using `--dump` with an option.
2026-05-19 12:08:23 +02:00
Pierre Tachoire
23a3d5476b Merge pull request #2458 from lightpanda-io/nikneym/cli-help-rework
`help`: rework `help` command
2026-05-18 11:54:29 +02:00
Halil Durak
b2d8c2b834 help.zon: introduce help.zon
Separates `help` explanation from configuration.
2026-05-15 18:31:09 +03:00
Halil Durak
f361f12316 cli.zig: change the way help command and sub-command detected
`cli.zig` is now aware of `help` command at all situations and creates it by itself. Instead of using errors, it initializes `Command` union where `help` branch is active.
2026-05-15 18:31:09 +03:00
Halil Durak
9c40cd9fb2 send Accept header when navigating 2026-05-15 18:30:18 +03:00
Scott Taylor
b2998470c2 Add --disable-workers + LP.configureLoading worker opt-out
Adds two ways to opt out of dedicated Web Worker loading entirely. The
Worker constructor still returns a Worker object so calling pages don't
throw, but no script fetch is initiated and the worker scope's eval
never runs (postMessage from the page is queued indefinitely with no
handler to drain it).

* CDP method LP.configureLoading { worker: bool } -- per-session
  toggleable at runtime, alongside the existing { subFrame: bool }.
  Both fields are now optional so callers can flip one without
  resetting the other to its default. Backwards-compatible.
* CLI flag --disable-workers -- process-wide default applying to every
  session and to the fetch subcommand. Operators can flip it on without
  any driver changes. Mirrors --disable-subframes (#2401) exactly.

## Motivation

Reliably-reproducible SIGABRT in Worker.loadInitialScript whenever a
page constructs a Web Worker AND lightpanda is launched with
--http_proxy. Crash signature:

    $msg="V8 fatal callback" location=v8::Context::Exit()
    message="Cannot exit non-entered context"
    Stack:
      _browser.webapi.Worker.loadInitialScript
      _browser.webapi.Worker.httpDoneCallback
      _network.layer.InterceptionLayer.InterceptContext.doneCallback
      _browser.HttpClient.processMessages
      _browser.HttpClient.perform
      _browser.HttpClient.tick

The Zig-side Enter/Exit pair around the worker's eval doesn't match
v8's entered_contexts stack invariant under that timing -- something
upstream of the loadInitialScript Exit leaves an extra Enter on the
stack, so v8's Utils::ApiCheck (`isolate->context() == *env`) fires
and the process aborts.

Reproducible against any Shopify storefront PDP (e.g.
https://weareallbirds.myshopify.com/products/mens-wool-runners) when
served through any HTTP proxy -- the proxy just adds enough latency
to surface the race; the same code path runs without --http_proxy
but the timing window is too tight to reliably hit. The Allbirds
trigger script is the Shopify web-pixel-extension worker, but ANY
Worker the page constructs hits the same code path.

The proper fix needs the v8 entered-contexts invariant to be
restored end-to-end through the worker eval. That's a deeper dig
into how Worker.loadInitialScript / WorkerGlobalScope.importScript /
ls.local.runMacrotasks compose with v8's microtask queues across
multiple contexts; I tried three intermediate fixes (deferring
loadInitialScript via the frame scheduler when other scripts are
mid-eval, replacing the post-eval cross-context runMacrotasks with
worker-only PerformCheckpoint, and removing runMacrotasks entirely)
and none stopped the crash. The bug is fired from inside the
synchronous tick path before the post-eval microtask handling
runs, which means the leak happens during Script::Run itself and
needs more targeted investigation.

This PR is the workaround so users hitting the SIGABRT on
storefront / analytics-heavy pages have a clean opt-in escape today.
For our use case (product catalog extraction) Workers carry no
extraction signal -- web-pixel sandboxes, analytics SDKs, marketing
tag pixels, etc. -- so disabling them removes a fragile code path
without any downside.

## Implementation

`Session.worker_loading_enabled: bool = true` -- default matches
existing behavior.

`Worker.init` short-circuits AFTER constructing the Worker /
WorkerGlobalScope / arena bookkeeping (so the JS `new Worker(url)`
expression doesn't throw):

    if (!session.worker_loading_enabled) {
        log.debug(.browser, "worker disabled", .{ .url = resolved_url });
        return self;
    }

Two ways to flip the flag, mirroring the --disable-subframes pattern:

1. LP.configureLoading { worker: bool } -- both subFrame and worker
   are now optional fields in the params struct, so existing callers
   passing only { subFrame } continue to work unchanged.
2. --disable-workers CLI flag -- added to CommonOptions (so it
   applies to serve, fetch, mcp). New Config.disableWorkers() getter;
   Session.init reads it as the initial value.

Total diff: +88 / -3 across 4 files (src/Config.zig,
src/browser/Session.zig, src/browser/webapi/Worker.zig,
src/cdp/domains/lp.zig).

## Verification

Reproducer pattern (puppeteer-core 24.42.0 + tiny CONNECT-tunnel
proxy on 127.0.0.1:9999, scripts in cdp-repros/):

  serve --host 127.0.0.1 --port 9222 --http_proxy http://127.0.0.1:9999
  serve --host 127.0.0.1 --port 9222 --http_proxy http://127.0.0.1:9999 --disable-workers

Driving https://weareallbirds.myshopify.com/products/mens-wool-runners:

  baseline (no --disable-workers): 5/5 SIGABRT in
    Worker.loadInitialScript with the v8 fatal callback above.

  with --disable-workers:           10/10 successful, returns full
    HTML (~1MB), no crash.

Test suite:

  make test  -> 637 of 637 tests passed (was 636/636 + new
    cdp.lp: configureLoading toggles subFrame and worker
    independently regression test).

  zig fmt --check ./*.zig ./**/*.zig  -> clean.

## Notes

* The CDP method is the same domain (LP.configureLoading) and same
  shape as --disable-subframes' driver-side opt-in, so existing
  Playwright / puppeteer integrations that already toggle
  subframes don't need a separate code path -- one CDP call can
  flip both.

* worker_loading_enabled = false does NOT remove Worker from the
  global namespace (so feature-detection like
  `if (typeof Worker !== 'undefined')` still reports true). It just
  makes constructed workers no-op. Pages that postMessage to a worker
  and wait for a response will hang on that promise forever (or
  until the page is torn down). For our extraction use case that's
  fine -- we control the worklist timeout anyway -- but it's worth
  noting if upstream wants to surface the disabled state more
  strongly (e.g. throw from postMessage, or remove the global
  entirely behind an even-stricter flag).

* Once the underlying v8 entered-contexts invariant is restored in
  Worker.loadInitialScript, this flag becomes a perf / sandboxing
  tool rather than a correctness workaround. Worth keeping anyway:
  blocking analytics / pixel workers is a reasonable thing to want.

## Related

* #2400 -- the iframe analog to this issue (subframe nav invalidates
  executionContextId); same workaround pattern.
* #2401 -- introduced --disable-subframes / LP.configureLoading
  { subFrame } that this PR mirrors exactly for workers.
2026-05-12 23:46:45 -04:00
Karl Seguin
e9b2aa4946 Merge pull request #2426 from lightpanda-io/feat/cdp-disable-iframes
Feat/cdp disable iframes
2026-05-12 13:16:16 +08:00
Karl Seguin
cdfabf7953 Minor tweaks
Remove repeating description/comment of flag.

Change CDP lp command to be a general configuration endpoint.
2026-05-12 12:56:33 +08:00
Francis Bouvier
31ef5246bc Add per-subcommand help via help or --help argument
`lightpanda <subcommand> help`, `lightpanda <subcommand> --help`
now print only the relevant subcommand options plus common options,
instead of the full text.

`lightpanda help <subcommand>` is also supported
(and that's what use internally).
2026-05-11 22:16:52 +02:00
Halil Durak
19401dc950 Config: update --inject-script documentation 2026-05-11 15:15:36 +03:00
Halil Durak
60d721caa2 Config: increase read file size for --inject-script-file 2026-05-11 15:15:36 +03:00
Halil Durak
246f91d1f8 --inject-script: prefer splice bytes into <head> directly 2026-05-11 15:15:35 +03:00
Halil Durak
c566d0c41c introduce --inject-script and --inject-script-file
* Prefer `--inject-*` prefix.
* Support injecting multiple scripts (also allows using both variants together).
* Instead of executing scripts in JS context, actually insert them to `<head>` for correct dump output.
2026-05-11 15:15:35 +03:00
Halil Durak
39f12a5669 fetch: add support for --script option
Allows passing a JS file as an arg to be executed alongside other scripts.
2026-05-11 15:15:35 +03:00
Scott Taylor
b272b0e33c Add --disable-subframes CLI flag
Complementary to LP.setSubframeLoading (preceding commit): exposes
the same iframe-skip behavior as a CLI option that applies to all
sessions in the process. Useful for:

  * the 'fetch' subcommand (no CDP driver to call LP.setSubframeLoading)
  * 'serve' deployments where the operator wants iframes off by
    default for every connecting client (the LP method can still
    re-enable per-session if needed)
  * Playwright's chromium.connectOverCDP, which can't reliably issue
    custom CDP methods on Lightpanda today: BrowserContext.newCDPSession
    and Browser.newBrowserCDPSession both attach a new CRSession that
    collides with the STARTUP-session reuse from #2399, triggering a
    Playwright internal assertion. With --disable-subframes set on the
    server, Playwright doesn't need to issue any custom CDP \u2014 every
    session inherits subframes-off and the executionContextId churn
    from #2400 never trips.

Verified:

  serve --disable-subframes + plain puppeteer-core goto
    [ok] goto status=200 elapsed=6354ms frameAttached=0

  fetch --disable-subframes --dump html https://www.allbirds.com/...
    exit=0
    html bytes: 1021562
    title: <title>Allbirds Wool Runners, Men's | ...</title>
    iframe count in dumped html: 2  (still in DOM, just not loaded)

521/521 unit tests pass.
2026-05-08 17:12:54 -04:00
Halil Durak
c5b16cb18e Config: add a custom validator for --log-level 2026-04-28 12:37:16 +03:00
Karl Seguin
ae8013f967 Merge pull request #2275 from lightpanda-io/nikneym/cli-variants
`cli`: introduce `variants` + fix `--wait-script-file` regression
2026-04-28 07:32:35 +08:00
Karl Seguin
f515233b52 Merge pull request #2248 from lightpanda-io/blackhole_storage
Add and default to Blackhole storage
2026-04-27 21:48:54 +08:00
Halil Durak
968cf5f9e5 Config: log-filter-scopes -> --log-filter-scopes 2026-04-27 16:06:29 +03:00
Halil Durak
f4b220fb5f Config: fix --wait-script-file regression 2026-04-27 16:06:12 +03:00
Karl Seguin
7819ee50fa Add and default to Blackhole storage
Tweak sqlite build to omit more features, hoping to shrink the size when sqlite
is used.
2026-04-26 09:04:33 +08:00
Karl Seguin
12c2efb811 Adds --terminate-ms command line argument + ctrl-c improvements in fetch
The main.zig path for `fetch` now captures the *Browser so that
browser.env.terminate() can be called. This is a bit more complex than the serve
path because the Browser owns the Isolate and can't be moved from one thread to
another.

With main having access to the browser, two things are now possible:
1 - We can support a --terminate-ms flag (https://github.com/lightpanda-io/browser/issues/2206)
2 - ctrl-c can correctly stop blocked JavaScript processes

1 is implemented via setitimer to set a timer for SIGALRM, avoiding the need to
add another "watcher" thread, or putting a timer in Network.run.
2026-04-25 12:34:06 +08:00
Nikolay Govorov
999f57b729 Remove timeout flag 2026-04-24 12:40:25 +01:00
Nikolay Govorov
c7d004fefb Setup timeout via tcp keepalive 2026-04-24 12:40:21 +01:00
Nikolay Govorov
c964604c7a Fix canada.ca problem 2026-04-23 12:15:57 +01:00
Karl Seguin
859a41ab4e Adds navigator.userAgentData
This API isn't supported by FireFox (yet), so it isn't a huge priority, but I
did notice that its used across many Google properties. It uses the same value
as Sec-Ch-Ua (https://github.com/lightpanda-io/browser/pull/2100) to provide
consistent data.

Smaller changes:
1 - Allow `OffscreenCanvas` to be used with Worker (noticed this error too)
2 - Don't like JS execution errors at "error" level for "load" and
    "DOMContentLoaded". "error" should be reserved for things we can fix and
    should never be triggered from invalid a bug in JavaScript.
2026-04-23 16:24:42 +08:00
Halil Durak
7035317e3e Config: rebase to main 2026-04-22 16:14:28 +03:00
Halil Durak
f389e1945f rebase to main 2026-04-22 16:08:14 +03:00
Halil Durak
1da11d8da8 Config: revert to --strip-mode 2026-04-22 16:08:13 +03:00
Halil Durak
012fe40bb5 cli: remove aliases and shortcuts 2026-04-22 16:08:12 +03:00
Halil Durak
a3af914cda Config: adaptation to new cookie arguments 2026-04-22 16:08:12 +03:00
Halil Durak
29d8e0c9b7 Config: bring back validateUserAgent 2026-04-22 16:08:12 +03:00
Halil Durak
5e0c046e96 rebase and remove commented out code 2026-04-22 16:08:11 +03:00
Halil Durak
721b959dbf Config: always return a valid mode for --dump 2026-04-22 16:07:05 +03:00
Halil Durak
74a518c56f cli: reintroduce command sniffing 2026-04-22 16:07:04 +03:00
Halil Durak
f5b9bdb51b rebase and backport new feature from main 2026-04-22 16:07:04 +03:00
Halil Durak
85e624356b Commands: more aliases 2026-04-22 16:07:03 +03:00
Halil Durak
886448aaa3 Commands: add shortcuts for host and port of serve 2026-04-22 16:07:02 +03:00
Halil Durak
10914d6288 cli: fix --log-filter-scopes regression 2026-04-22 16:07:02 +03:00
Halil Durak
56b6fbe011 cli: many improvements
* Options with optional types are introduced to null by default
* Options with boolean types cannot be optional (nullable)
* Options with boolean types are introduced with false by default
* introduce shortcuts for options that can be provided via single dash
* introduce validators; custom parsing logic can be inserted for niche cases
2026-04-22 16:07:01 +03:00
Halil Durak
81b89e67b7 Config: adapt to new CLI builder 2026-04-22 16:07:01 +03:00
Karl Seguin
6d0003ad2b Sqlite
This adds an app.storage which is a union around configurable storage engine
(currently, only sqlite).

It is _not_ being used anywhere right now. The goal is to get feedback on
the implementation and then move cache to it.

This doesn't expose a generic query API. The goal is that the storage will
expose high level methods, e.g. `cacheGet(req: CacheGetRequest)` and every
storage engine will translate the `storage.CacheGetRequest` as needed.

A thin wrapper around the Sqlite C api is included, e.g. exec(SQL, .{args}) a
`rows` and `row` fetcher. A connection pool is included. By default, an
in-memory DB is currently created. And a `migrations` table with an id of `1`
is created/inserted. I don't imagine needing fancy migratations.
2026-04-20 17:13:06 +08:00