Commit Graph

4284 Commits

Author SHA1 Message Date
Karl Seguin
28a7e7fe45 Basic protocol support for websocket.
Websockets client can send a Protocol which the server can agree to. This isn't
as fancy as it sounds. We just send a specific header on websocket handshake
and then read the response header.
2026-04-13 11:21:59 +08:00
Karl Seguin
63104a7f82 Re-enable debug allocator in debug
Disabled this when looking at memory profiles, and must have accidentally
committed it.
2026-04-11 12:24:19 +08:00
Pierre Tachoire
071e70e5cc Merge pull request #2133 from lightpanda-io/feat/cdp-json-endpoints
Feat/cdp json endpoints
2026-04-10 17:51:48 +02:00
Pierre Tachoire
88dcac642a Merge pull request #2131 from lightpanda-io/pushstate-pathname
update page URL and location on pushState/replaceState
2026-04-10 17:37:43 +02:00
Matt Van Horn
224a7333f2 fix: use fixed Lightpanda/1.0 for /json/version User-Agent
Replace dynamic build version string with stable Lightpanda/1.0
in the Browser and User-Agent fields of the /json/version response.
The dev version (1.0.0-dev.5492+...) is not useful for CDP clients.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 17:36:10 +02:00
Matt Van Horn
416984d32f fix: update integration test for enriched /json/version response
The integration test at "server: get /json/version" was hardcoding
the old response with Content-Length: 48. Updated to verify the
enriched fields structurally since the version string varies at
build time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 17:36:09 +02:00
Matt Van Horn
503ca4ce07 feat: enrich CDP /json/version and add /json/list endpoint
Add Browser, Protocol-Version, and User-Agent fields to the
/json/version CDP endpoint response. Previously it only returned
webSocketDebuggerUrl, while Chrome and other CDP browsers return
7+ fields that automation tools use for capability detection.

Also add /json/list and /json endpoints that return an empty JSON
array, matching the standard CDP endpoint layout that tools like
Puppeteer and chromedp expect.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 17:36:05 +02:00
Muki Kiboigo
d47e24ced0 add test for History URL updating 2026-04-10 07:38:21 -07:00
Adrià Arrufat
d6aea1187f Merge branch 'main' into fix/markdown-link-formatting 2026-04-10 16:26:14 +02:00
Muki Kiboigo
08cd9ca799 properly resolve URL before setting Location in History 2026-04-10 07:26:01 -07:00
Adrià Arrufat
c1a65160c1 markdown: test block link aria-label and title handling 2026-04-10 16:14:58 +02:00
Karl Seguin
24e17b6f21 Merge pull request #2130 from lightpanda-io/arena_pool_buckets
Add arena buckets to ArenaPool
2026-04-10 20:33:19 +08:00
Karl Seguin
771df02c49 Merge pull request #2129 from lightpanda-io/non-utf8-querystring-encoding
Non utf8 querystring encoding
2026-04-10 20:33:07 +08:00
Karl Seguin
ddf614a9d5 Add arena buckets to ArenaPool
ArenaPool previously maintained up to 512 16KB buckets. The 16KB retention is
small for things like XHR and scripts, but increasing it to something more
reasonably, like 128KB, would use up to 8x more memory.

This commit adds 4 buckets: 1KB, 4KB, 16KB and 128KB. Callers can request a
tiny, small, medium or large bucket. We end up using less memory peak memory
and less allocations.

Furthermore, callers can request a specific size. This is particularly useful
for WebSocket or Blob where the size could vary greatly (so we'd likely default
to a large bucket), but that could needlessly use up a large arena.

The bucket sizes were derived from analyzing allocations. A significant number
of allocations were very small. Things like ScheduleCallback and
FinalizerCallback are always less than 1K and can be generated in the thousands.
The 16KB retention was wasteful in these cases...better to have a large number
of 1K pools, so that we can have a handful of very large buffers.
2026-04-10 19:09:18 +08:00
Karl Seguin
2cfa1ea035 Merge pull request #2116 from lightpanda-io/gc-snapshot
force an aggressive GC on v8 after snapshot creation
2026-04-10 18:48:08 +08:00
Karl Seguin
05229fdc53 Use the document's charset to determine if/how to encode querystring
Whenever we resolve a URL, say from `anchor.href`, we should consider the
document's charset when encoding the querystring. This probably isn't the
most important feature, but it makes tens of thousands of WPT cases pass, e.g

/encoding/legacy-mb-tchinese/big5/big5-encode-href-errors-han.html?3001-4000 and
/encoding/legacy-mb-japanese/euc-jp/eucjp-encode-href-errors-han.html?17001-18000

DOM elements previous called `URL.resolveURL(...)`. They now call
`self.asNode().resolveURL(...)`, where `Node#resolveURL` will provide the
document's charset.
2026-04-10 16:47:42 +08:00
Karl Seguin
f7c1710c23 Expose correct charset
document.characterSet, document.charset and document.inputEncoding now exposes
the correct charset.
2026-04-10 16:47:42 +08:00
Karl Seguin
828715b751 Improve TextDecoder to support all necessary encoding types
Uses the newly added encoding_rs to implement TextDecoder for all encoding.
Claude wrote 100% of the Rust binding.

Improves various WPT tests, e.g. /encoding/api-basics.any.html.
2026-04-10 16:47:41 +08:00
Pierre Tachoire
d80e4227b4 force an aggressive GC on v8 after snapshot creation 2026-04-10 10:41:57 +02:00
Adrià Arrufat
070ee7df80 Merge branch 'main' into fix-telemetry-decoding 2026-04-10 09:42:21 +02:00
Pierre Tachoire
a4617390de Merge pull request #2104 from lightpanda-io/feat/add-ip-filter
Feat/add ip filter
2026-04-10 08:46:06 +02:00
Matt Van Horn
5cc49e79b8 fix: update block link test to match new link text format
The "browser.markdown: block link" test expected the old format
([](url)). Updated to expect [url](url) since block-content
anchors without aria-label/title now use the href as display text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 20:49:39 -07:00
Matt Van Horn
065e9383d0 fix: use proper link text in markdown dump for block-content anchors
When an anchor wraps block content (divs, images), the markdown dump
produced `([](url))` with empty display text. This is not valid
markdown and provides no useful information to LLMs consuming the
output.

Now uses the anchor's aria-label or title attribute as display text,
falling back to the href itself. Produces `[label](url)` instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 19:33:08 -07:00
Karl Seguin
0b6f099f43 Merge pull request #2117 from lightpanda-io/startup_order_tweak
Initialize snapshot before network
2026-04-10 08:30:54 +08:00
Pierre Tachoire
91e366cb71 move memoryPressureNotification call on session.resetPage
Run V8 GC with memoryPressureNotification directly into
session.resetPage to be sure to save free right after resources are
removed.
2026-04-10 08:12:51 +08:00
Karl Seguin
2e8a7e59c4 Merge pull request #2105 from lightpanda-io/no-context-handling
Better handle v8 callback with no valid context
2026-04-10 08:11:36 +08:00
Karl Seguin
e8fbaeb2d9 Initialize snapshot before network
When the snapshot isn't baked-into the binary, it requires more memory to
initialize and load. By changing the initializing order from: network, isolate,
snapshot to: isolate, snapshot, network we can reduce the peak startup memory.
This is because the short-live snapshot intermediary data and the long-lived
network data (certs) no longer overlap.
2026-04-10 07:08:13 +08:00
Karl Seguin
96736d440d Merge pull request #2112 from lightpanda-io/reduce_telemetry_size
Reduces the size of Telemetry.Lightpanda
2026-04-10 07:07:55 +08:00
Karl Seguin
cec9628e37 zig fmt 2026-04-10 06:51:50 +08:00
Karl Seguin
255fa247c3 Reduces the size of Telemetry.Lightpanda
32920 -> 8344

We buffer [1024]Telemetry.Event, and the event is unnecessarily large.
2026-04-10 06:51:50 +08:00
Karl Seguin
8eaeafe16c Fix a lot of typos.
I used https://github.com/crate-ci/typos, it worked well.

Also, make sure cdp-initiated KeyboardEvent is freed when no element is in focus
2026-04-10 06:51:10 +08:00
Karl Seguin
4dcb2c997e Better handle v8 callback with no valid context
In https://github.com/lightpanda-io/browser/pull/1885 we added fallback to the
incumbent context when the current context had be released (by us, but not by
v8).

This now handles the case where there is no incumbent context. It's not clear
exactly why this can happen, but we do see it in some WPT tests (e.g.
/html/browsers/the-window-object/named-access-on-the-window-object/navigated-named-objects.window.html)
2026-04-10 06:50:47 +08:00
Adrià Arrufat
d19e62ec3c http: add default write callback to prevent stdout pollution 2026-04-09 22:03:09 +02:00
Karl Seguin
0253092f20 Improvements to IpFilters
The main change is changing how CidrV4 and CidrV6 are stored, by pre-calculating
their mask and storing their address as integer.

This allows significant simplification of matchesCidrV4 and matchesCidrV6.
2026-04-09 15:40:16 +08:00
Karl Seguin
689fb908ac Merge pull request #2110 from lightpanda-io/test-cache-filter
cache: add log filter to garbage file test
2026-04-09 07:55:07 +08:00
Karl Seguin
795b0affe2 Merge pull request #2102 from lightpanda-io/non-utf8-encoding
Use encoding_rs on non-UTF-8 html to convert to utf-8
2026-04-09 07:25:48 +08:00
Adrià Arrufat
182447c907 cache: add log filter to garbage file test 2026-04-08 19:36:29 +02:00
Trevin Chow
574264aaff fix: update page URL and location on pushState/replaceState
history.pushState() and replaceState() updated the navigation entry
but did not update page.url or reinitialize window.location. This
caused location.pathname to return the old value after pushState,
breaking SPA routing detection in automation scripts.

Both methods now set page.url and re-init the Location object after
updating the navigation history.

Fixes #2081
Ref #2043
2026-04-08 08:34:08 -07:00
Pierre Tachoire
6ef518438b fix custom cidrs mem leak 2026-04-08 15:09:01 +02:00
Pierre Tachoire
e57b5c645b remove deadcode libcurl.CurlOpenSocketFunction 2026-04-08 14:06:17 +02:00
Karl Seguin
077b8b1481 Merge pull request #2106 from lightpanda-io/simplify_nodelist_foreach
Simplifies NodeList.foreach
2026-04-08 19:51:15 +08:00
Karl Seguin
077263bae4 Simplifies NodeList.foreach
Removes 2 layers of indirection (including 1 allocation) that is unnecessary for
an internal call.
2026-04-08 18:32:47 +08:00
Karl Seguin
763927c352 Use encoding_rs on non-UTF-8 html to convert to utf-8
Using our existing MIME type detection, this uses encoding_rs to convert non-
UTF-8 content to UTF-8, which can then be passed to html5ever.

Issue: https://github.com/lightpanda-io/browser/issues/2089
2026-04-08 18:32:08 +08:00
Pierre Tachoire
f884b562ba user-agent ovveride must not contain mozilla 2026-04-08 12:11:09 +02:00
Pierre Tachoire
efb2fa9c22 Send Sec-Ch-Ua http header 2026-04-08 12:11:09 +02:00
Pierre Tachoire
ae9b4d3fc6 stricter user-agent rule 2026-04-08 12:11:09 +02:00
Trevin Chow
f0aacad52e feat: add --user-agent flag for full User-Agent override
When --user-agent is set, the provided string replaces the entire
User-Agent header instead of appending to "Lightpanda/1.0".
The existing --user-agent-suffix behavior is unchanged.

Fixes #2029
2026-04-08 12:11:08 +02:00
Lucien Coffe
7f5abfc9cf fix: use dashes in CLI flag names for consistency
Rename --block_private_networks to --block-private-networks and
--block_cidrs to --block-cidrs to match the existing flag naming
convention (e.g. --http-proxy, --proxy-bearer-token).
2026-04-08 12:10:46 +02:00
Lucien Coffe
fb6c4e4978 feat: add allow-list exclusions to --block_cidrs
CIDRs prefixed with '-' are treated as allow rules that exempt matching
IPs from blocking. Allow rules take precedence over both
--block_private_networks and custom block CIDRs.

Example: --block_private_networks --block_cidrs -10.0.0.42/32
blocks all private ranges except 10.0.0.42.

Adds 3 new tests for allow-list behavior.
2026-04-08 12:10:46 +02:00
Lucien Coffe
f5cfc4d315 feat: add --block_private_networks and --block_cidrs CLI flags
Block outbound HTTP requests to specified IP ranges before TCP handshake
using libcurl CURLOPT_OPENSOCKETFUNCTION callback. Fires after DNS
resolution, reads resolved IP directly from sockaddr, does bitwise CIDR
comparison. Fail-closed: unknown address families are blocked.

--block_private_networks blocks RFC1918, localhost, link-local, ULA.
--block_cidrs blocks additional comma-separated CIDRs.
IPv4-mapped IPv6 (::ffff:x.x.x.x) is unwrapped to prevent bypass.
2026-04-08 12:10:42 +02:00