Commit Graph

6690 Commits

Author SHA1 Message Date
Karl Seguin
037db695ff Merge pull request #2492 from lightpanda-io/cdp_connection
Re-organization CDP connection
2026-05-20 06:45:30 +08:00
Navid EMAD
814ca8ab3f accessibility: unify query+tree writers, route objectId via dom.getNode
Fold QueryWriter into Writer behind an Opts.filter. Tree mode is unchanged
(filter=null); query mode walks the full subtree (including AX-ignored
nodes per the queryAXTree spec) and emits the flat-match shape. Shared
resolveRole helper handles label-promotion for both paths so the two
can't drift.

Drop the "objectId not yet supported" carve-out: queryAXTree now reuses
dom.getNode, which already resolves nodeId/backendNodeId/objectId.
2026-05-19 20:43:54 +02:00
Marc Helbling
fec2bbda7b review 2026-05-19 18:13:34 +02:00
Pierre Tachoire
a15c04de4b ci: remove cdp logs from end to end tests 2026-05-19 17:32:13 +02:00
Navid EMAD
bb8d0593b2 StyleManager: author display rule must beat UA [hidden] / display:none
Move the UA `matchesUaDisplayNoneRule` short-circuit from BEFORE the
author rule walk in `isElementHidden` to AFTER it, gated on
`display_priority == 0` (no author display rule matched). Per CSS
Cascade §6.1 any normal-origin author rule beats UA origin regardless
of specificity, so `.x { display: flex }` on `<div class="x" hidden>`
must report visible. The author rule machinery
(`addRawRule`/`isRelevant`/`checkRules` + `id_rules`/`class_rules`/
`tag_rules`/`other_rules`) already produces the right answer; only the
check order in `isElementHidden` was wrong.

Refs #2496
2026-05-19 17:18:20 +02:00
Karl Seguin
345cc9c6c0 zig fmt 2026-05-19 23:10:23 +08:00
Karl Seguin
97c8ca3832 when work is done, don't keep polling, return to process it 2026-05-19 22:39:48 +08:00
Navid EMAD
32dbd716b1 Apply fragment parse-mode to DOMParser
Closes the DOMParser gap left as a follow-up in the previous review-fix
commit. DOMParser.parseFromString built its target Document via the
frame's parser without touching `_parse_mode`, so `Build.created` →
`linkAddedCallback` → `loadExternalStylesheet` saw `_parse_mode ==
.document` and fetched/registered sheets on the LIVE frame document for
every stylesheet link in the parsed string.

Bracket both the text/html and XML branches with the same fragment
parse-mode `parseHtmlAsChildren` uses. The existing gate in
`loadExternalStylesheet` already short-circuits on .fragment, so no
change is needed there. Side benefits: parser-emitted scripts in
DOMParser content stop reaching `scriptAddedCallback` against the live
frame, default-script injection skips DOMParser content, and mutation
observers on the live document no longer fan out for parsed nodes —
all of which match what DOMParser should do per spec.

Regression test extended to cover the DOMParser path alongside the
existing innerHTML case.

Refs #2343
2026-05-19 15:51:35 +02:00
Navid EMAD
f05efd6719 Harden external stylesheet path per code review
Addresses 8 findings from ultrareview on the external stylesheet feature:

* UAF on CDP teardown during syncRequest. `loadExternalStylesheet`
  pumps the CDP socket inline, so a `Target.closeTarget` arriving
  mid-fetch could drive `Session.removePage` and free the frame
  while we still held `self`. Set `_script_manager.base.is_evaluating`
  around the call — the same bracket every other syncRequest caller
  uses, which is what `Session.removePage`'s reentrancy guard checks.

* Disconnect leak. `link.remove()` left the sheet on
  `document.styleSheets` and in the cascade forever; the disconnect
  walker had a `<style>` branch but no `<link>` mirror. Common SPA
  theme-switch pattern (append new sheet, remove old) was broken.
  Added the parallel `else if` branch.

* Fragment-parsed links. `Build.created` fires for parser-instantiated
  elements before attachment, including innerHTML / outerHTML /
  insertAdjacentHTML / Range.createContextualFragment / <template>
  content. Without a guard those fetched against the live document
  and registered phantom sheets even when the fragment was never
  attached. Added `_parse_mode == .fragment` early-return mirroring
  the existing `nodeIsReady` short-circuit. DOMParser is a separate
  case (parses with `.document` into a different Document) and is
  left as a known follow-up.

* Missing Referer. Every other resource-fetch path
  (ScriptManagerBase, XHR, Fetch, WorkerGlobalScope) routes through
  `Frame.headersForRequest` to attach the cached `Referer` header.
  Many CDNs gate stylesheet delivery on Referer; without it requests
  returned 403/302 and the CSS silently failed. Added the call.

* Header OOM leak. `headers.add` between `newHeaders()` and
  `syncRequest` (which takes ownership) leaked the initial 3-entry
  slist on OOM. Added `errdefer headers.deinit()` mirroring
  RobotsLayer.zig:121-122.

* `_href` mutated before parse could fail. On parse error the cached
  sheet was left with the new URL but old rules dropped — violated
  the "previous sheet intact on failure" invariant the PR description
  promises. Moved the `_href` assignment to after `replaceSync`
  succeeds. Full atomicity would require a scratch-list pattern in
  `CSSStyleSheet.replaceSync` itself; documented as a known limit.

* `_sheet` cached before registration could OOM. If `sheets.add`
  failed, `link._sheet` pointed at an unregistered sheet and every
  future re-fetch short-circuited via the `orelse` branch, leaving
  the sheet permanently unreachable through `document.styleSheets`.
  Assign `link._sheet` only after `sheets.add` succeeds.

* Stale CLI help text claimed `--enable-external-stylesheets` was a
  no-op surface. Removed the obsolete sentence.

New regression tests cover fragment-parse skip and disconnect
removal+re-add. Full suite 694/694 pass.

Refs #2343
2026-05-19 15:51:34 +02:00
Navid EMAD
4592812027 Reuse cached sheet on link href change
Caught in code review: `loadExternalStylesheet` created a fresh
`CSSStyleSheet` and appended to `document.styleSheets` on every call, so
mutating `link.href` on a connected stylesheet element accumulated stale
sheets — the old rules kept cascading because the previous sheet was
never removed.

Cache the sheet on `Link._sheet` (mirroring `Style._sheet`) and reuse it
via `replaceSync` on re-fetch. First load creates + registers as before;
subsequent loads swap content in place, keeping `document.styleSheets`
length stable.

On fetch failure the cached sheet is untouched — matches browser
behavior where a broken href doesn't invalidate the previously loaded
sheet until the link itself is removed.

Refs #2343
2026-05-19 15:50:11 +02:00
Navid EMAD
3e409d49e9 Implement external stylesheet fetch + parse
Wires up --enable-external-stylesheets / LP.configureLoading.externalStylesheets
from the prior surface-only commit. When the flag is set, parser- and
JS-created <link rel=stylesheet> elements now synchronously fetch and parse
their href, register a CSSStyleSheet on document.styleSheets, and feed
StyleManager so checkVisibility() reflects external rules. Flag stays
default-off — scrapers that don't need accurate visibility pay nothing.

Frame.loadExternalStylesheet mirrors ScriptManager.addFromElement: same
HttpClient.syncRequest path, same arena ownership, same per-frame
notification + cookie wiring. Body is routed through CSSStyleSheet.replaceSync,
which already parses, populates cssRules, and calls sheetModified() — no
StyleManager changes needed. 2 MiB hard cap on a single sheet body, status
non-2xx and oversize both fire `error` on the link.

Link.Build.created is added so static head <link> elements reach
linkAddedCallback at all — void elements never trigger nodeComplete, which
is why static `<link>` had no observable effect before. Mirrors Image.

HttpClient.Request.ResourceType gains a `.stylesheet` variant so CDP Network
events report the right type; cdp.fetch.zig switches updated.

Refs #2343
2026-05-19 15:50:11 +02:00
Navid EMAD
6ed41ea346 Add --enable-external-stylesheets flag (no-op surface)
Reserves the CLI flag and LP.configureLoading externalStylesheets field
so drivers can adopt the API before the fetch implementation lands in a
follow-up that depends on #2303.

The bool is intentionally unread in this PR. Mirrors the existing
--disable-subframes / --disable-workers plumbing; the CDP field extends
LP.configureLoading alongside subFrame and worker without breaking
existing callers.

Refs #2343
2026-05-19 15:50:11 +02:00
Karl Seguin
ed05a6b14f test thread safety
LogFilter isn't thread safe, so setting it in a test where the log filter is
read from another thread trigger's TSAN. LogFilter.deinit now waits until
the server has no active threads.
2026-05-19 21:26:53 +08:00
Karl Seguin
875c147783 Main/Network reads CDP socket
Previously, the CDP socket was added to the worker's multi and fully owned
by the worker. While this is simple, it introduced some issues:

1 - Cannot detect a disconnected client during JS processing ( for(;;) )

2 - A blocked worker can cause back-pressure that blocks the client. This can
    cause a deadlock if the worker is blocked waiting for a CDP message

In addition to these 2 problems, there was 1 other serious CDP-related issue:
arbitrary CDP messages could be processed during JavaScript callback. For
example, a Worker calls importScripts while request interception is enabled,
this requires us to tick the HttpClient waiting for the interception response.
But, a client could sent Target.closeTarget, which we'd process and delete the
frame..all while importScripts is still blocked. Assuming importScripts unblocks
everything is a big UAF since the frame (and its workers) were cleared from
closeTarget.

The CDP socket is now read from the network (main) thread and an OTP-style
mailbox is used. The network thread posts message to the Worker's inbox and
signals it to wakeup. This solves #1 and #2. It doesn't directly solve the
reentrancy issue, but it provides the foundation. Specifically, in introduces
a queue for of CDP message and more control over when/how that queue is
processed. At "safe points" (Runner.tick, HttpClient.tick), any message can
be processed. But, when inside a JavaScript callback, we can process only non-
destructive/mutating message. Specifically, we can process only messages related
to request interception.
2026-05-19 20:52:21 +08:00
Karl Seguin
e61eddf956 Merge pull request #2493 from lightpanda-io/nikneym/fix-injection-through-authority
`URL.zig`: fix NUL/CR/LF/TAB character injection through authority
2026-05-19 19:12:16 +08:00
Halil Durak
64a3f3edd7 URL.zig: update tests 2026-05-19 13:55:34 +03:00
Marc Helbling
a89a28a4a2 feat: add --json to fetch command
The `fetch` command is very practical to render pages without needing to
have a long running browser instance.
It is however masking all details on the fetch, most importantly the HTTP status code.
This is a big caveat when leveraging `lightpanda fetch` in a pipeline.

This introduces a `--json` option to provide a structured output that
contains:
* url
* HTTP status code
* response headers
* rendered content as controlled by the `--dump` option

The proposal is to always output the same JSON format even when not
using `--dump` with an option.
2026-05-19 12:08:23 +02:00
Halil Durak
6bc4ebdfed URL.zig: fix NUL/CR/LF/TAB character injection through authority 2026-05-19 12:29:39 +03:00
Karl Seguin
fd0831fe93 Merge pull request #2469 from lightpanda-io/nikneym/samesite-strict-cookie-vulnerability
`Cookie`: honor SameSite=Strict on cross-site navigation
2026-05-19 16:20:08 +08:00
Halil Durak
f17a260d93 prefer initiator_url to calculate SameSite correctly when navigating
changes after rebase
2026-05-19 10:53:25 +03:00
Halil Durak
a8029c079e Cookie.zig: add a test for SameSite=Strict on cross-site navigation 2026-05-19 10:53:24 +03:00
Karl Seguin
8ef6084fdb Re-organization CDP connection
network/WsConnection.zig was poorly named. It didn't represent a generic WS
connection, but rather a CDP-specific connection. This splits the generic WS
logic into network/WS.zig and the CDP-specific details in cdp/Connection.zig.

Some of the connection management in the Server has also been simplified.
2026-05-19 10:08:22 +08:00
Halil Durak
bdd456f76c Merge pull request #2491 from willmafh/improve-code-readability
more clean validateCookieString function to improve code readability
2026-05-18 17:53:45 +03:00
willmafh
2f66edc9b9 more clean validateCookieString function to improve code readability 2026-05-18 22:29:01 +08:00
Karl Seguin
b83cd9262b Merge pull request #2490 from lightpanda-io/blocking_read_failure_handling
On blocking read failure, break from loop
2026-05-18 21:19:40 +08:00
Karl Seguin
49aa0ad1a9 On blocking read failure, break from loop
Blocking read failure almost certainly means a disconnect client. As-is, that's
an endless loop. Instead, fail the request.
2026-05-18 19:44:25 +08:00
Pierre Tachoire
23a3d5476b Merge pull request #2458 from lightpanda-io/nikneym/cli-help-rework
`help`: rework `help` command
2026-05-18 11:54:29 +02:00
Pierre Tachoire
8b098a3c97 Merge pull request #2488 from lightpanda-io/ci-mcp-smoke-jq-tighten 2026-05-17 12:50:23 +02:00
Adrià Arrufat
8981a6245c ci: tighten mcp-smoke jq assertions
Replace `grep '"id":N' | jq -e ...` with `jq -ec 'select(.id == N) | ...'`.
The grep form also matched `"id":10`, `"id":11`, ... and any tool description
containing that substring; numeric `select` is type-correct. `jq -e` still
fails the job when `select` produces no output (exit 4), so the smoke
semantics are preserved.

Also add `jq --version` up front so the job fails fast and loud if the
`ubuntu-latest` image ever stops shipping jq.
2026-05-17 10:43:03 +02:00
Pierre Tachoire
803e4303c2 Merge pull request #2481 from navidemad/ci-mcp-smoke
ci: smoke test the MCP stdio server
2026-05-17 10:39:18 +02:00
Pierre Tachoire
4e80db6cf0 Merge pull request #2483 from navidemad/dockerfile-pipefail-hygiene
Dockerfile: fix curl|sh pipefail; trim builder stage
2026-05-16 19:21:30 +02:00
Pierre Tachoire
a3944a3b40 Merge pull request #2484 from lightpanda-io/e2e_kill_between_steps
Force kill lightpanda between steps to prevent "port already in-use" …
2026-05-16 18:51:36 +02:00
Karl Seguin
ab63cfbf39 Merge pull request #2478 from navidemad/fix-c10-inline-media-evaluation
css: evaluate @media and matchMedia against viewport
2026-05-16 21:42:56 +08:00
Karl Seguin
d870972ceb Small tweaks to @media
- Depth counter when recursing
- Better comment support
- Small perf tweak (e.g. lowercase once into stack buffer before multiple
  compares)
- Few more test cases
2026-05-16 20:52:11 +08:00
Karl Seguin
21e74b46ea Merge pull request #2486 from willmafh/typo-fix
typo fix
2026-05-16 20:39:36 +08:00
willmafh
c52356b6d7 chore: lowercase demo word 2026-05-16 20:07:32 +08:00
willmafh
c1e64232e5 chore: typo fix 2026-05-16 20:05:52 +08:00
Karl Seguin
7f8cb145e6 Merge pull request #2485 from lightpanda-io/nikneym/timers-hash
`Timers`: prefer integer-optimized hashing
2026-05-16 16:52:53 +08:00
Halil Durak
33d594be43 Timers: prefer integer-optimized hashing 2026-05-16 10:19:33 +03:00
Karl Seguin
d926291241 Merge pull request #2467 from lightpanda-io/http_transfer
Cleanup HttpClient.Transfer
2026-05-16 08:52:12 +08:00
Karl Seguin
0b358fd410 Merge pull request #2474 from staylor/fix/2472-frame-id-reset
Fix #2472: scope frame ID generator to Browser, not Session
2026-05-16 08:46:27 +08:00
Karl Seguin
94e8b06583 Merge pull request #2482 from navidemad/make-v8-path
make: forward optional V8_PATH to zig build
2026-05-16 08:41:05 +08:00
Karl Seguin
a5c1068b85 Force kill lightpanda between steps to prevent "port already in-use" error in CI 2026-05-16 08:39:53 +08:00
Navid EMAD
54e09a5ace make: rename V8_PATH to generic ZIGFLAGS
Per review feedback, generalise the optional pass-through so any
`-D...` build option can be forwarded, not just the prebuilt V8 path.
2026-05-16 02:27:52 +02:00
Karl Seguin
5550b61d2d Merge pull request #2480 from navidemad/make-clean
make: add clean target
2026-05-16 07:35:09 +08:00
Karl Seguin
732e19c7b6 add cargo clean to html5ever 2026-05-16 07:34:35 +08:00
Karl Seguin
d3f3e7f335 Merge pull request #2475 from navidemad/fix-a41-json-undefined
js: emit `null` when JSON-stringifying unserializable values
2026-05-16 07:24:14 +08:00
Karl Seguin
2163a2fd5a Merge pull request #2463 from lightpanda-io/nikneym/nav-accept-header
Send `Accept` header when navigating
2026-05-16 06:39:40 +08:00
Navid EMAD
fd0700a572 dockerfile: fix curl|sh pipefail; trim builder stage
- Download rustup to a file then execute, so a failed curl is not
  masked by sh's exit code under /bin/sh (no pipefail).
- Add --no-install-recommends and apt-list cleanup to both apt stages
  (stage 0 drops from 156 to 116 packages, 1144 MB to 605 MB).
- Add --retry 3 --retry-delay 2 to all 4 external downloads.
- Use git clone --depth 1 (28 MB to 9.6 MB working tree).
- Drop -v from tar for minisign and zig extractions (log noise only).

Final shipped image is unchanged; the wins live in the builder stage
and build-cache footprint.
2026-05-15 23:45:06 +02:00
Navid EMAD
f08a1fef12 ci: smoke test the MCP stdio server
Sends initialize + notifications/initialized + tools/list over stdin
and asserts the JSON-RPC responses with jq. Catches regressions in
the agentic surface (./lightpanda mcp) without needing a node client.

Reuses the existing lightpanda-build-release artifact, so the new
job costs about a minute on top of zig-build-release.
2026-05-15 22:53:38 +02:00