Commit Graph

6344 Commits

Author SHA1 Message Date
Pierre Tachoire
2ad2c9d878 Merge pull request #2487 from navidemad/feat/external-stylesheets-flag
Add --enable-external-stylesheets flag with fetch + parse
2026-05-20 13:41:59 +02:00
Karl Seguin
b6fd09c5ab Merge pull request #2502 from lightpanda-io/max-cdp-conn
by using httpClient, fetch generates a call to Config.maxConnections
2026-05-20 16:28:56 +08:00
Pierre Tachoire
639cb14cb3 Merge pull request #2494 from marchelbling/feat-fetch-json-option
feat: add --json to fetch command
2026-05-20 10:17:21 +02:00
Pierre Tachoire
6eb25d5c44 by using httpClient, fetch generates a call to Config.maxConnections 2026-05-20 10:14:17 +02:00
Karl Seguin
a314984b2e Merge pull request #2495 from lightpanda-io/cdp_inbox
Main/Network reads CDP socket
2026-05-20 14:35:29 +08:00
Pierre Tachoire
29d5a7ae9e Merge pull request #2497 from lightpanda-io/ci-revert-cdp-logs
ci: remove cdp logs from end to end tests
2026-05-20 08:30:12 +02:00
Karl Seguin
e7b16983bb Add help documentation + small tweaks
This adds help documentation for the --json flag. This is the only thing that
must be kept from this commit.

It uses our testing.zig to streamline the json testing (instead of string
probing). It removes the JsonEnvelope in favor of an anonymous struct (though,
I'm fine with adding it back in if it's needed to resolve something ambiguous).

Finally, I removed the last unit test as, at that point, it's really just
testing Zig's JSON stringifier (could arguably make the same case for the other
two, but there's some logic there about how nulls/empty might be handled).
2026-05-20 10:44:41 +08:00
Karl Seguin
2fd2be0e51 Small largely stylistic tweaks
Trim down a lot of the comments. Inline remarks in some cases rather than a
large function header.

Removing the `errdefer headers.deinit();` is unfortunate but currently
necessary to avoid a potential double-free if the request gets far enough that
http_client frees it and still returns an error. This is a known issue that
needs to be fixed separately and that impacts multiple call-sites. My "fix"
introduces a possible (very small) memory leak versus a possible crash.
2026-05-20 10:31:28 +08:00
Karl Seguin
61386059c1 promote invalid CDP message from warn to err 2026-05-20 07:19:57 +08:00
Karl Seguin
037db695ff Merge pull request #2492 from lightpanda-io/cdp_connection
Re-organization CDP connection
2026-05-20 06:45:30 +08:00
Marc Helbling
fec2bbda7b review 2026-05-19 18:13:34 +02:00
Pierre Tachoire
a15c04de4b ci: remove cdp logs from end to end tests 2026-05-19 17:32:13 +02:00
Karl Seguin
345cc9c6c0 zig fmt 2026-05-19 23:10:23 +08:00
Karl Seguin
97c8ca3832 when work is done, don't keep polling, return to process it 2026-05-19 22:39:48 +08:00
Navid EMAD
32dbd716b1 Apply fragment parse-mode to DOMParser
Closes the DOMParser gap left as a follow-up in the previous review-fix
commit. DOMParser.parseFromString built its target Document via the
frame's parser without touching `_parse_mode`, so `Build.created` →
`linkAddedCallback` → `loadExternalStylesheet` saw `_parse_mode ==
.document` and fetched/registered sheets on the LIVE frame document for
every stylesheet link in the parsed string.

Bracket both the text/html and XML branches with the same fragment
parse-mode `parseHtmlAsChildren` uses. The existing gate in
`loadExternalStylesheet` already short-circuits on .fragment, so no
change is needed there. Side benefits: parser-emitted scripts in
DOMParser content stop reaching `scriptAddedCallback` against the live
frame, default-script injection skips DOMParser content, and mutation
observers on the live document no longer fan out for parsed nodes —
all of which match what DOMParser should do per spec.

Regression test extended to cover the DOMParser path alongside the
existing innerHTML case.

Refs #2343
2026-05-19 15:51:35 +02:00
Navid EMAD
f05efd6719 Harden external stylesheet path per code review
Addresses 8 findings from ultrareview on the external stylesheet feature:

* UAF on CDP teardown during syncRequest. `loadExternalStylesheet`
  pumps the CDP socket inline, so a `Target.closeTarget` arriving
  mid-fetch could drive `Session.removePage` and free the frame
  while we still held `self`. Set `_script_manager.base.is_evaluating`
  around the call — the same bracket every other syncRequest caller
  uses, which is what `Session.removePage`'s reentrancy guard checks.

* Disconnect leak. `link.remove()` left the sheet on
  `document.styleSheets` and in the cascade forever; the disconnect
  walker had a `<style>` branch but no `<link>` mirror. Common SPA
  theme-switch pattern (append new sheet, remove old) was broken.
  Added the parallel `else if` branch.

* Fragment-parsed links. `Build.created` fires for parser-instantiated
  elements before attachment, including innerHTML / outerHTML /
  insertAdjacentHTML / Range.createContextualFragment / <template>
  content. Without a guard those fetched against the live document
  and registered phantom sheets even when the fragment was never
  attached. Added `_parse_mode == .fragment` early-return mirroring
  the existing `nodeIsReady` short-circuit. DOMParser is a separate
  case (parses with `.document` into a different Document) and is
  left as a known follow-up.

* Missing Referer. Every other resource-fetch path
  (ScriptManagerBase, XHR, Fetch, WorkerGlobalScope) routes through
  `Frame.headersForRequest` to attach the cached `Referer` header.
  Many CDNs gate stylesheet delivery on Referer; without it requests
  returned 403/302 and the CSS silently failed. Added the call.

* Header OOM leak. `headers.add` between `newHeaders()` and
  `syncRequest` (which takes ownership) leaked the initial 3-entry
  slist on OOM. Added `errdefer headers.deinit()` mirroring
  RobotsLayer.zig:121-122.

* `_href` mutated before parse could fail. On parse error the cached
  sheet was left with the new URL but old rules dropped — violated
  the "previous sheet intact on failure" invariant the PR description
  promises. Moved the `_href` assignment to after `replaceSync`
  succeeds. Full atomicity would require a scratch-list pattern in
  `CSSStyleSheet.replaceSync` itself; documented as a known limit.

* `_sheet` cached before registration could OOM. If `sheets.add`
  failed, `link._sheet` pointed at an unregistered sheet and every
  future re-fetch short-circuited via the `orelse` branch, leaving
  the sheet permanently unreachable through `document.styleSheets`.
  Assign `link._sheet` only after `sheets.add` succeeds.

* Stale CLI help text claimed `--enable-external-stylesheets` was a
  no-op surface. Removed the obsolete sentence.

New regression tests cover fragment-parse skip and disconnect
removal+re-add. Full suite 694/694 pass.

Refs #2343
2026-05-19 15:51:34 +02:00
Navid EMAD
4592812027 Reuse cached sheet on link href change
Caught in code review: `loadExternalStylesheet` created a fresh
`CSSStyleSheet` and appended to `document.styleSheets` on every call, so
mutating `link.href` on a connected stylesheet element accumulated stale
sheets — the old rules kept cascading because the previous sheet was
never removed.

Cache the sheet on `Link._sheet` (mirroring `Style._sheet`) and reuse it
via `replaceSync` on re-fetch. First load creates + registers as before;
subsequent loads swap content in place, keeping `document.styleSheets`
length stable.

On fetch failure the cached sheet is untouched — matches browser
behavior where a broken href doesn't invalidate the previously loaded
sheet until the link itself is removed.

Refs #2343
2026-05-19 15:50:11 +02:00
Navid EMAD
3e409d49e9 Implement external stylesheet fetch + parse
Wires up --enable-external-stylesheets / LP.configureLoading.externalStylesheets
from the prior surface-only commit. When the flag is set, parser- and
JS-created <link rel=stylesheet> elements now synchronously fetch and parse
their href, register a CSSStyleSheet on document.styleSheets, and feed
StyleManager so checkVisibility() reflects external rules. Flag stays
default-off — scrapers that don't need accurate visibility pay nothing.

Frame.loadExternalStylesheet mirrors ScriptManager.addFromElement: same
HttpClient.syncRequest path, same arena ownership, same per-frame
notification + cookie wiring. Body is routed through CSSStyleSheet.replaceSync,
which already parses, populates cssRules, and calls sheetModified() — no
StyleManager changes needed. 2 MiB hard cap on a single sheet body, status
non-2xx and oversize both fire `error` on the link.

Link.Build.created is added so static head <link> elements reach
linkAddedCallback at all — void elements never trigger nodeComplete, which
is why static `<link>` had no observable effect before. Mirrors Image.

HttpClient.Request.ResourceType gains a `.stylesheet` variant so CDP Network
events report the right type; cdp.fetch.zig switches updated.

Refs #2343
2026-05-19 15:50:11 +02:00
Navid EMAD
6ed41ea346 Add --enable-external-stylesheets flag (no-op surface)
Reserves the CLI flag and LP.configureLoading externalStylesheets field
so drivers can adopt the API before the fetch implementation lands in a
follow-up that depends on #2303.

The bool is intentionally unread in this PR. Mirrors the existing
--disable-subframes / --disable-workers plumbing; the CDP field extends
LP.configureLoading alongside subFrame and worker without breaking
existing callers.

Refs #2343
2026-05-19 15:50:11 +02:00
Karl Seguin
ed05a6b14f test thread safety
LogFilter isn't thread safe, so setting it in a test where the log filter is
read from another thread trigger's TSAN. LogFilter.deinit now waits until
the server has no active threads.
2026-05-19 21:26:53 +08:00
Karl Seguin
875c147783 Main/Network reads CDP socket
Previously, the CDP socket was added to the worker's multi and fully owned
by the worker. While this is simple, it introduced some issues:

1 - Cannot detect a disconnected client during JS processing ( for(;;) )

2 - A blocked worker can cause back-pressure that blocks the client. This can
    cause a deadlock if the worker is blocked waiting for a CDP message

In addition to these 2 problems, there was 1 other serious CDP-related issue:
arbitrary CDP messages could be processed during JavaScript callback. For
example, a Worker calls importScripts while request interception is enabled,
this requires us to tick the HttpClient waiting for the interception response.
But, a client could sent Target.closeTarget, which we'd process and delete the
frame..all while importScripts is still blocked. Assuming importScripts unblocks
everything is a big UAF since the frame (and its workers) were cleared from
closeTarget.

The CDP socket is now read from the network (main) thread and an OTP-style
mailbox is used. The network thread posts message to the Worker's inbox and
signals it to wakeup. This solves #1 and #2. It doesn't directly solve the
reentrancy issue, but it provides the foundation. Specifically, in introduces
a queue for of CDP message and more control over when/how that queue is
processed. At "safe points" (Runner.tick, HttpClient.tick), any message can
be processed. But, when inside a JavaScript callback, we can process only non-
destructive/mutating message. Specifically, we can process only messages related
to request interception.
2026-05-19 20:52:21 +08:00
Karl Seguin
e61eddf956 Merge pull request #2493 from lightpanda-io/nikneym/fix-injection-through-authority
`URL.zig`: fix NUL/CR/LF/TAB character injection through authority
2026-05-19 19:12:16 +08:00
Halil Durak
64a3f3edd7 URL.zig: update tests 2026-05-19 13:55:34 +03:00
Marc Helbling
a89a28a4a2 feat: add --json to fetch command
The `fetch` command is very practical to render pages without needing to
have a long running browser instance.
It is however masking all details on the fetch, most importantly the HTTP status code.
This is a big caveat when leveraging `lightpanda fetch` in a pipeline.

This introduces a `--json` option to provide a structured output that
contains:
* url
* HTTP status code
* response headers
* rendered content as controlled by the `--dump` option

The proposal is to always output the same JSON format even when not
using `--dump` with an option.
2026-05-19 12:08:23 +02:00
Halil Durak
6bc4ebdfed URL.zig: fix NUL/CR/LF/TAB character injection through authority 2026-05-19 12:29:39 +03:00
Karl Seguin
fd0831fe93 Merge pull request #2469 from lightpanda-io/nikneym/samesite-strict-cookie-vulnerability
`Cookie`: honor SameSite=Strict on cross-site navigation
2026-05-19 16:20:08 +08:00
Halil Durak
f17a260d93 prefer initiator_url to calculate SameSite correctly when navigating
changes after rebase
2026-05-19 10:53:25 +03:00
Halil Durak
a8029c079e Cookie.zig: add a test for SameSite=Strict on cross-site navigation 2026-05-19 10:53:24 +03:00
Karl Seguin
8ef6084fdb Re-organization CDP connection
network/WsConnection.zig was poorly named. It didn't represent a generic WS
connection, but rather a CDP-specific connection. This splits the generic WS
logic into network/WS.zig and the CDP-specific details in cdp/Connection.zig.

Some of the connection management in the Server has also been simplified.
2026-05-19 10:08:22 +08:00
Halil Durak
bdd456f76c Merge pull request #2491 from willmafh/improve-code-readability
more clean validateCookieString function to improve code readability
2026-05-18 17:53:45 +03:00
willmafh
2f66edc9b9 more clean validateCookieString function to improve code readability 2026-05-18 22:29:01 +08:00
Karl Seguin
b83cd9262b Merge pull request #2490 from lightpanda-io/blocking_read_failure_handling
On blocking read failure, break from loop
2026-05-18 21:19:40 +08:00
Karl Seguin
49aa0ad1a9 On blocking read failure, break from loop
Blocking read failure almost certainly means a disconnect client. As-is, that's
an endless loop. Instead, fail the request.
2026-05-18 19:44:25 +08:00
Pierre Tachoire
23a3d5476b Merge pull request #2458 from lightpanda-io/nikneym/cli-help-rework
`help`: rework `help` command
2026-05-18 11:54:29 +02:00
Pierre Tachoire
8b098a3c97 Merge pull request #2488 from lightpanda-io/ci-mcp-smoke-jq-tighten 2026-05-17 12:50:23 +02:00
Adrià Arrufat
8981a6245c ci: tighten mcp-smoke jq assertions
Replace `grep '"id":N' | jq -e ...` with `jq -ec 'select(.id == N) | ...'`.
The grep form also matched `"id":10`, `"id":11`, ... and any tool description
containing that substring; numeric `select` is type-correct. `jq -e` still
fails the job when `select` produces no output (exit 4), so the smoke
semantics are preserved.

Also add `jq --version` up front so the job fails fast and loud if the
`ubuntu-latest` image ever stops shipping jq.
2026-05-17 10:43:03 +02:00
Pierre Tachoire
803e4303c2 Merge pull request #2481 from navidemad/ci-mcp-smoke
ci: smoke test the MCP stdio server
2026-05-17 10:39:18 +02:00
Pierre Tachoire
4e80db6cf0 Merge pull request #2483 from navidemad/dockerfile-pipefail-hygiene
Dockerfile: fix curl|sh pipefail; trim builder stage
2026-05-16 19:21:30 +02:00
Pierre Tachoire
a3944a3b40 Merge pull request #2484 from lightpanda-io/e2e_kill_between_steps
Force kill lightpanda between steps to prevent "port already in-use" …
2026-05-16 18:51:36 +02:00
Karl Seguin
ab63cfbf39 Merge pull request #2478 from navidemad/fix-c10-inline-media-evaluation
css: evaluate @media and matchMedia against viewport
2026-05-16 21:42:56 +08:00
Karl Seguin
d870972ceb Small tweaks to @media
- Depth counter when recursing
- Better comment support
- Small perf tweak (e.g. lowercase once into stack buffer before multiple
  compares)
- Few more test cases
2026-05-16 20:52:11 +08:00
Karl Seguin
21e74b46ea Merge pull request #2486 from willmafh/typo-fix
typo fix
2026-05-16 20:39:36 +08:00
willmafh
c52356b6d7 chore: lowercase demo word 2026-05-16 20:07:32 +08:00
willmafh
c1e64232e5 chore: typo fix 2026-05-16 20:05:52 +08:00
Karl Seguin
7f8cb145e6 Merge pull request #2485 from lightpanda-io/nikneym/timers-hash
`Timers`: prefer integer-optimized hashing
2026-05-16 16:52:53 +08:00
Halil Durak
33d594be43 Timers: prefer integer-optimized hashing 2026-05-16 10:19:33 +03:00
Karl Seguin
d926291241 Merge pull request #2467 from lightpanda-io/http_transfer
Cleanup HttpClient.Transfer
2026-05-16 08:52:12 +08:00
Karl Seguin
0b358fd410 Merge pull request #2474 from staylor/fix/2472-frame-id-reset
Fix #2472: scope frame ID generator to Browser, not Session
2026-05-16 08:46:27 +08:00
Karl Seguin
94e8b06583 Merge pull request #2482 from navidemad/make-v8-path
make: forward optional V8_PATH to zig build
2026-05-16 08:41:05 +08:00
Karl Seguin
a5c1068b85 Force kill lightpanda between steps to prevent "port already in-use" error in CI 2026-05-16 08:39:53 +08:00