Commit Graph

6157 Commits

Author SHA1 Message Date
Francis Bouvier
31ef5246bc Add per-subcommand help via help or --help argument
`lightpanda <subcommand> help`, `lightpanda <subcommand> --help`
now print only the relevant subcommand options plus common options,
instead of the full text.

`lightpanda help <subcommand>` is also supported
(and that's what use internally).
2026-05-11 22:16:52 +02:00
Karl Seguin
e56bd0862a Merge pull request #2419 from lightpanda-io/nix-flake-update
Update Nix Flake
2026-05-11 22:59:00 +08:00
Karl Seguin
3733abbf8a Merge pull request #2413 from lightpanda-io/small_dom_fixes
Various small DOM fixes, WPT driven
2026-05-11 21:58:01 +08:00
Muki Kiboigo
105f5028c8 update nix flake 2026-05-11 06:50:03 -07:00
Karl Seguin
b79870e07a Merge pull request #2414 from lightpanda-io/test_timeout_config
Allow HTML Tests to set a timeout
2026-05-11 21:49:57 +08:00
Karl Seguin
082994c331 Allow HTML Tests to set a timeout
Change worker timeout to 8seconds. This test can be slow on a slow CI with
TSAN enabled.
2026-05-11 21:30:25 +08:00
Karl Seguin
6fa1fe12a5 Merge pull request #2235 from lightpanda-io/nikneym/cli-script-eval
`fetch`: add support for `--inject-script` and `--inject-script-file` options
2026-05-11 21:24:50 +08:00
Karl Seguin
8bea867e5f Merge pull request #2406 from navidemad/parser-defer-rawtext-merge
parser: defer raw-text merge to bound memory growth
2026-05-11 21:05:39 +08:00
Karl Seguin
258003ca90 ArrayListUnmanaged (deprecated name) -> ArrayList 2026-05-11 20:42:11 +08:00
Navid EMAD
a470f6b686 parser: lift merge buffer onto Parser and lazy-buffer single-chunk runs
Addresses follow-up review from karlseguin on #2406.

The pending-text merge buffer is now a single ArrayList on the Parser,
reused across runs via clearRetainingCapacity. In the streaming-parser
case (Document.write), parser.arena is the page-lifetime frame.arena, so
the previous per-PendingText buf.deinit was a no-op and growth artifacts
accumulated. With one shared buffer, total dead memory is bounded to one
peak-run-sized allocation regardless of how many text runs the parse
contains.

Single-chunk text runs no longer touch the buffer. The first chunk lives
only on CData._data via createTextNode; the buffer is seeded from
text_node.getData().str() only when a second chunk arrives at the same
parent and last_child. flushPendingText is a no-op when the buffer is
empty. Restores the common-case allocation count to 1 (matching main),
vs 3 in the previous PR head.

Benchmark deltas (ReleaseFast, peak RSS, 5-run median):
- 10K-paragraph synthetic page: 39 MB -> 37 MB
- 20K single-chunk script synthetic: 56 MB -> 54 MB
- 100 x 48 KB multi-chunk scripts: within noise (~46 MB)
- apple.com US iPhone live page: within JS-driven noise (~92 MB)

Refs #2397
2026-05-11 14:36:40 +02:00
Karl Seguin
cfcfe4ee29 Merge pull request #2417 from lightpanda-io/nikneym/dummy-set-resource-timing-buffer-size
`Performance`: add dummy `setResourceTimingBufferSize`
2026-05-11 20:31:17 +08:00
Halil Durak
5faaf3dc15 --inject-script: testable injected scripts 2026-05-11 15:24:28 +03:00
Halil Durak
3c7c08f822 --inject-script: don't error out if script execution fails 2026-05-11 15:24:08 +03:00
Halil Durak
19401dc950 Config: update --inject-script documentation 2026-05-11 15:15:36 +03:00
Halil Durak
d97081d71f --inject-script: execute injected scripts if encountered <head> tag
This now backs itself against html5ever to find <head>; the spec explicitly bans double-appearance of <head> tag and html5ever is aware of it.
2026-05-11 15:15:36 +03:00
Halil Durak
60d721caa2 Config: increase read file size for --inject-script-file 2026-05-11 15:15:36 +03:00
Halil Durak
246f91d1f8 --inject-script: prefer splice bytes into <head> directly 2026-05-11 15:15:35 +03:00
Halil Durak
c566d0c41c introduce --inject-script and --inject-script-file
* Prefer `--inject-*` prefix.
* Support injecting multiple scripts (also allows using both variants together).
* Instead of executing scripts in JS context, actually insert them to `<head>` for correct dump output.
2026-05-11 15:15:35 +03:00
Halil Durak
39f12a5669 fetch: add support for --script option
Allows passing a JS file as an arg to be executed alongside other scripts.
2026-05-11 15:15:35 +03:00
Halil Durak
9d7eee211a Performance: fix zig fmt fail 2026-05-11 15:12:44 +03:00
Karl Seguin
08bc513fd9 Merge pull request #2416 from lightpanda-io/nikneym/link-crossorigin-getter-setter
`HTMLLinkElement`: `crossOrigin` -> `crossorigin` for attributes
2026-05-11 20:12:37 +08:00
Halil Durak
556cbc1c9f Update src/browser/webapi/Performance.zig
Co-authored-by: Karl Seguin <karlseguin@users.noreply.github.com>
2026-05-11 15:09:06 +03:00
Karl Seguin
5ba0928635 Merge pull request #2415 from lightpanda-io/nikneym/link-media-getter-setter
` HTMLLinkElement`: add `media` getter/setter
2026-05-11 20:00:09 +08:00
Halil Durak
bcc82bff4a Performance: add dummy setResourceTimingBufferSize 2026-05-11 14:47:40 +03:00
Halil Durak
4e4e68e51c HTMLLinkElement: update tests 2026-05-11 14:40:40 +03:00
Halil Durak
3c8c849947 HTMLLinkElement: crossOrigin -> crossorigin for attributes 2026-05-11 14:40:26 +03:00
Halil Durak
20c7bc14d2 HTMLLinkElement: update tests 2026-05-11 14:35:41 +03:00
Halil Durak
55a42fe5c6 HTMLLinkElement: add media getter/setter 2026-05-11 14:35:30 +03:00
Navid EMAD
60219e69e9 parser: address review findings for raw-text merge
- Flush pending text in _removeFromParentCallback and
  _reparentChildrenCallback. Without these, html5ever can detach or
  reparent the pending text node mid-parse and a later flush would write
  accumulated bytes onto a node no longer in the tree (or to the wrong
  parent).

- Streaming.done now nulls self.handle right after html5ever_finish,
  before flushPendingText. If the flush errors the handle is already
  cleared, so dropping the Streaming can't double-free.

- Document.close uses a defer to clear _script_created_parser even when
  done() returns an error. Document.write's parser-panic path now
  attempts a final flush before dropping the streaming parser, so
  whatever bytes html5ever fed before the panic still land on their
  text node.

- raw_text_chunked.html: larger raw-text bodies and exact byte counts
  per element. Catches future deferred-merge regressions that drop or
  duplicate a chunk; the memory bound itself is verified out-of-band
  via the live reproducer in the PR description.

Refs #2397
2026-05-11 13:22:29 +02:00
Navid EMAD
15101f12e4 parser: defer raw-text merge to bound memory growth
Frame.appendNew did String.concat(arena, [existing, txt]) every time
html5ever flushed a script-data/rawtext chunk on a '<' token, allocating
O(N) on the page-lifetime arena per chunk. Total bytes ~= N^2/(2*c). On
apple.com US iPhone pages a 347 KB inline JSON literal with embedded
HTML strings ballooned the parse to 3.5 GB peak RSS.

Move the merge into the parser. Same-parent text chunks accumulate in a
std.ArrayListUnmanaged on the per-parse arena; one String.dupe lands the
final value on the frame arena. Flush points are the natural ends of a
text run: a non-text child appended, a foster/before-sibling insertion,
the parent element popping, or the parse call returning.

Frame.appendNew now takes *Node directly; it had no non-parser callers.
Streaming.done returns !void to propagate the final flush.

Refs #2397
2026-05-11 13:22:29 +02:00
Pierre Tachoire
d2151b6ffd Merge pull request #2305 from navidemad/feat/xpath-1.0-evaluator
xpath: implement XPath 1.0 (Document.evaluate, XPathResult, DOM.performSearch)
2026-05-11 10:01:28 +02:00
Karl Seguin
c16c15bedf Various small DOM fixes, WPT driven
1. Implement document.adoptNode (we were removing from the existing document,
   but not adding to the new document)

2. Document.url should use the document's frame, falling back to the execution
   frame

3. Move HTMLDocument.location to Document.location

4. DOMImplementation.createDocument uses a more appropriate default namespace
   (xml -> null)

5. Map querySelector functions to DOMException-safe errors. The Selector returns
   specific errors, but for the DOM apis (document.querySelector,
   df.querySelectorAll, elem.matches, etc...) these largely all map to
   SyntaxError
2026-05-11 14:29:47 +08:00
Karl Seguin
efbf1db87c Merge pull request #2410 from lightpanda-io/fix_merge
Try to fix a bad merge
2026-05-11 11:37:28 +08:00
Karl Seguin
0bbddb3179 Try to fix a bad merge
https://github.com/lightpanda-io/browser/pull/2289
and
https://github.com/lightpanda-io/browser/pull/2297
2026-05-11 11:25:40 +08:00
Karl Seguin
1bfefa3d58 Merge pull request #2289 from navidemad/fix-b2-page-navigation-history
page: implement Page.getNavigationHistory and Page.navigateToHistoryEntry
2026-05-11 09:29:43 +08:00
Karl Seguin
92d617d649 Merge pull request #2404 from navidemad/fix-fetch-double-free-on-sync-error
Fix double-free in fetch when http_client.request fails synchronously
2026-05-10 12:03:07 +08:00
Karl Seguin
520d968840 Merge pull request #2398 from staylor/fix/worker-importscripts-segfault
Defer page teardown while worker scripts are evaluating
2026-05-10 11:08:49 +08:00
Karl Seguin
261059acbe Merge pull request #2393 from lightpanda-io/scheduler_timeslice
Add timeslice to scheduler
2026-05-10 10:33:21 +08:00
Scott Taylor
92607ad765 Defer page teardown while worker scripts are evaluating
Worker scripts can call importScripts(), which performs a synchronous
HTTP request via HttpClient.syncRequest. To stay responsive during a
long fetch, syncRequest pumps the CDP socket (cdp.blocking_read) while
waiting. If a CDP message such as Target.closeTarget arrives on that
socket mid-fetch, the previous code path tore down the page
immediately:

    Worker JS -> importScripts -> syncRequest -> blocking_read
      -> CDP dispatch -> Target.closeTarget
      -> Session.removePage -> Page.deinit -> Frame.deinit
      -> Worker.deinit (frees worker arena + identity_map)

When control unwound back into the worker's eval, the next operation
that hit ctx.identity.identity_map.getOrPut dereferenced the freed
metadata pointer and segfaulted (sometimes immediately, sometimes a
few connections later as the arena got recycled).

Reproducer: any URL that loads dedicated workers calling importScripts
during initial eval, driven via puppeteer-core's connectOverCDP. The
allbirds.com product page (which loads ~8 web-pixel workers each
calling importScripts) reliably triggered it within ~10 connections.

Session.removePage already deferred when the frame's own
ScriptManager.is_evaluating was set; that guard never tripped because
worker scripts don't go through the frame's ScriptManager. Fix:

  * Worker.loadInitialScript now sets the worker's own
    _worker_scope._script_manager.is_evaluating around the eval, with
    save/restore so nested worker evals compose correctly.

  * WorkerGlobalScope.importScript also sets its own
    _script_manager.is_evaluating around the syncRequest +
    runMacrotasks. The typical caller (Worker.loadInitialScript)
    already sets this around its outer eval, so the outer guard
    usually covers us; the inner mark is defense-in-depth for callers
    that reach importScripts() from a setTimeout / microtask outside
    the loadInitialScript scope.

  * New Frame.anyScriptEvaluating method walks the frame tree (frame
    ScriptManager + every worker's ScriptManager + child frames) and
    returns true if any is mid-eval. Session.removePage and
    CDP.disposeBrowserContext use this in place of the frame-only
    check, deferring teardown until all evals unwind. Final cleanup
    happens at CDP.deinit on connection close, matching the existing
    deferred-teardown contract.

Verified by running the puppeteer-core repro back-to-back against a
single Lightpanda serve; all returned 200 with the right title, no
UAF crashes (was previously crashing within 1-10 runs). All 521 unit
tests still pass.

Note: a separate, pre-existing latent V8 issue surfaces under stress
on this same code path. After many iterations a Runtime.evaluate
promise tracked by V8's inspector PromiseHandlerTracker is discarded
during garbage collection's first-pass weak callbacks; the discard
sends a failure response which triggers v8::String::NewFromOneByte,
hitting the debug-only assertion AllowHeapAllocation::IsAllowed() in
heap-allocator-inl.h:79 (no allocations allowed during weak callbacks).
This reproduces on a baseline build of this PR commit and on a
baseline build of just the original two-line is_evaluating fix \u2014
i.e. it is not introduced by the deferral logic. The deferral makes
it more visible because inspector callbacks now live longer before
teardown, so they are more likely to be alive during a GC. Tracking
this as a follow-up; the fix here still resolves the UAF that was
crashing the server immediately.
2026-05-09 17:26:41 -04:00
Navid EMAD
d7e283fed9 Don't propagate http_client.request errors from Fetch.init
When http_client.request fails synchronously (e.g. RobotsLayer returning
RobotsBlocked because robots.txt is already cached), Client.request
invokes our error_callback before returning the error. httpErrorCallback
rejects the promise and releases response._arena. Letting the error
propagate from Fetch.init also fires the `errdefer response.deinit`,
double-freeing the arena and corrupting the arena pool — eventually
surfacing as a malloc abort during teardown.

Fixes #2403.
2026-05-09 14:57:08 +02:00
Navid EMAD
d8b9391e33 xpath: drop internal references from comments
Strip mentions of the private gem and its internal paths from xpath
module docstrings, the conformance test header, and the dom dispatch
heuristic. Comments now describe behavior directly without pointing at
sources public readers can't access.
2026-05-08 08:58:07 +02:00
Navid EMAD
0b0a34c4a2 cdp: match closed set of axis names in isXPathQuery
The previous `::` heuristic accepted any identifier-like character before
`::`, which misrouted CSS pseudo-elements (`a::before`, `div::after`) to
the XPath evaluator. Walk back the run of [a-zA-Z-] characters and look
the candidate up in a StaticStringMap of the 13 XPath 1.0 named axes,
so only real axis names match.
2026-05-08 08:44:32 +02:00
Karl Seguin
9830da04d8 Naming convention fixes
Disable xpath_perf benchmark from test run as its quite verbose.
2026-05-08 08:44:31 +02:00
Navid EMAD
ce722c1f6e xpath: extend fast path to non-positional descendant queries
Generalizes 8733e33b's //tag[@id='x'] shape: tryFusedDescendantFastPath
handles any //tag[safe] or .//tag[safe] where the predicates are
non-positional boolean/node-set checks. Walks the search root's
descendants once in document order, applies node test + predicates
inline, no per-step materialization, no dedup.

5-9x on //div, //*, //*[@class='x'], //div[contains(...)]; ~25x on
(//div)[1] and count(//div) where the inner path is the shape.

Safety gate rejects predicates that could produce a number at the
top level (number, neg, arithmetic binop, numeric-returning fn-call)
and any predicate containing position()/last() anywhere. Conservative:
a nested sub-path's local positional predicate is rejected even though
it's scoped to its own axis.
2026-05-08 08:44:31 +02:00
Navid EMAD
c4c700f7ab xpath: id-lookup fast path + perf benchmark
evalPath recognizes //tag[@id='x'] and .//tag[@id='x'] (plus the
//*[@id='x'] wildcard) and serves them via frame.getElementByIdFromNode.
~100-150x speedup on ID lookups (3231us -> 22.6us for //*[@id='target']
in the new benchmark). Falls through to general path on any deviation
(extra step, extra predicate, non-eq, non-literal RHS).

Inherits the same duplicate-ID compromise selector/List.zig ships for
querySelector(All): the id-map stores only the first element per ID in
document order. Capybara/Selenium hot paths assume unique IDs.

tests/xpath/xpath_perf.html is the 13-query micro-benchmark used to
collect the numbers; batched console.warn output survives test runner
interleaving.
2026-05-08 08:44:31 +02:00
Navid EMAD
379664044e xpath: apply review correctness feedback
- Document.evaluate / XPathEvaluator.evaluate / XPathExpression.evaluate:
  result_type / requested_type now optional u16 defaulting to ANY_TYPE
  (matches WHATWG: `optional unsigned short type = 0`). context_node
  stays nullable with a fallback to the document — preserves the
  polyfill's behavior asserted by the `default_context` fixture
- ast.zig NodeTest: clarify that namespaced names (`prefix:*`,
  `prefix:local`) are stored verbatim and fall through to a literal
  match against the node name — consistent with the `namespace::` axis
  stub (decision #3). Adds a TODO for if the polyfill ever drops the
  stub
- Parser: cap recursive descent at depth 64 with new
  error.MaxDepthExceeded; depth tracked across parseExpr (parens,
  predicates, function args) and parseUnaryExpr (chained `-`). Two
  regression tests cover deep parenthesization and deep unary minus
2026-05-08 08:44:31 +02:00
Navid EMAD
94bcee6322 xpath: apply review style/convention feedback
- Rename Result.zig / Ast.zig / Functions.zig to snake_case (no
  top-level fields per Zig style guide)
- Restructure imports across xpath module: lib (std/lp) → relative
  (further → nearer) → aliases
- Move `frame` to last parameter on Evaluator.evaluate, searchAll,
  Functions.call, idFn (matches js bridge convention); call sites
  updated in webapi/XPath{Result,Expression}.zig and cdp/domains/dom.zig
- Local-pos style in XPathResult.iterateNext
2026-05-08 08:44:31 +02:00
Navid EMAD
e7c3e77c41 xpath: match CDATASection in text() node test
Per XPath 1.0 §5.7, the data model has no CDATASection node — CDATA
content is part of the text node value. The text() node test was only
matching DOM nodeType 3 (Text), silently excluding CDATA sections
(nodeType 4) parsed via DOMParser/XMLDocument and inline foreign
content like SVG with embedded scripts.
2026-05-08 08:44:31 +02:00
Navid EMAD
a4abbb6d13 xpath: cache attribute axis nodes via frame lookup
The attribute axis was calling Entry.toAttribute on every visit,
materializing fresh *Attribute structs (plus duped name/value strings)
into page-lifetime storage. Repeated XPath queries — the Capybara/
Selenium polling pattern this PR targets — accumulated unbounded
copies for the same DOM entries. Route through frame._attribute_lookup
so each Entry resolves to a single cached *Attribute, matching
List.getAttribute and NamedNodeMap.getAtIndex.
2026-05-08 08:44:30 +02:00
Navid EMAD
33714a4dfd cdp: tighten isXPathQuery '::' heuristic
A bare indexOf("::") matched CSS pseudo-elements (a::before) and
attribute values containing '::' ([data-x="x::y"]), misrouting them
to the XPath evaluator. Require an axis-name shape ([a-zA-Z-])
immediately before '::' so only real axis specifiers like
descendant::p are dispatched to XPath.
2026-05-08 08:44:30 +02:00