Websockets client can send a Protocol which the server can agree to. This isn't
as fancy as it sounds. We just send a specific header on websocket handshake
and then read the response header.
Replace dynamic build version string with stable Lightpanda/1.0
in the Browser and User-Agent fields of the /json/version response.
The dev version (1.0.0-dev.5492+...) is not useful for CDP clients.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The integration test at "server: get /json/version" was hardcoding
the old response with Content-Length: 48. Updated to verify the
enriched fields structurally since the version string varies at
build time.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Browser, Protocol-Version, and User-Agent fields to the
/json/version CDP endpoint response. Previously it only returned
webSocketDebuggerUrl, while Chrome and other CDP browsers return
7+ fields that automation tools use for capability detection.
Also add /json/list and /json endpoints that return an empty JSON
array, matching the standard CDP endpoint layout that tools like
Puppeteer and chromedp expect.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ArenaPool previously maintained up to 512 16KB buckets. The 16KB retention is
small for things like XHR and scripts, but increasing it to something more
reasonably, like 128KB, would use up to 8x more memory.
This commit adds 4 buckets: 1KB, 4KB, 16KB and 128KB. Callers can request a
tiny, small, medium or large bucket. We end up using less memory peak memory
and less allocations.
Furthermore, callers can request a specific size. This is particularly useful
for WebSocket or Blob where the size could vary greatly (so we'd likely default
to a large bucket), but that could needlessly use up a large arena.
The bucket sizes were derived from analyzing allocations. A significant number
of allocations were very small. Things like ScheduleCallback and
FinalizerCallback are always less than 1K and can be generated in the thousands.
The 16KB retention was wasteful in these cases...better to have a large number
of 1K pools, so that we can have a handful of very large buffers.
Whenever we resolve a URL, say from `anchor.href`, we should consider the
document's charset when encoding the querystring. This probably isn't the
most important feature, but it makes tens of thousands of WPT cases pass, e.g
/encoding/legacy-mb-tchinese/big5/big5-encode-href-errors-han.html?3001-4000 and
/encoding/legacy-mb-japanese/euc-jp/eucjp-encode-href-errors-han.html?17001-18000
DOM elements previous called `URL.resolveURL(...)`. They now call
`self.asNode().resolveURL(...)`, where `Node#resolveURL` will provide the
document's charset.
Uses the newly added encoding_rs to implement TextDecoder for all encoding.
Claude wrote 100% of the Rust binding.
Improves various WPT tests, e.g. /encoding/api-basics.any.html.
The "browser.markdown: block link" test expected the old format
([](url)). Updated to expect [url](url) since block-content
anchors without aria-label/title now use the href as display text.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When an anchor wraps block content (divs, images), the markdown dump
produced `([](url))` with empty display text. This is not valid
markdown and provides no useful information to LLMs consuming the
output.
Now uses the anchor's aria-label or title attribute as display text,
falling back to the href itself. Produces `[label](url)` instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the snapshot isn't baked-into the binary, it requires more memory to
initialize and load. By changing the initializing order from: network, isolate,
snapshot to: isolate, snapshot, network we can reduce the peak startup memory.
This is because the short-live snapshot intermediary data and the long-lived
network data (certs) no longer overlap.
In https://github.com/lightpanda-io/browser/pull/1885 we added fallback to the
incumbent context when the current context had be released (by us, but not by
v8).
This now handles the case where there is no incumbent context. It's not clear
exactly why this can happen, but we do see it in some WPT tests (e.g.
/html/browsers/the-window-object/named-access-on-the-window-object/navigated-named-objects.window.html)
The main change is changing how CidrV4 and CidrV6 are stored, by pre-calculating
their mask and storing their address as integer.
This allows significant simplification of matchesCidrV4 and matchesCidrV6.
history.pushState() and replaceState() updated the navigation entry
but did not update page.url or reinitialize window.location. This
caused location.pathname to return the old value after pushState,
breaking SPA routing detection in automation scripts.
Both methods now set page.url and re-init the Location object after
updating the navigation history.
Fixes#2081
Ref #2043
When --user-agent is set, the provided string replaces the entire
User-Agent header instead of appending to "Lightpanda/1.0".
The existing --user-agent-suffix behavior is unchanged.
Fixes#2029
Rename --block_private_networks to --block-private-networks and
--block_cidrs to --block-cidrs to match the existing flag naming
convention (e.g. --http-proxy, --proxy-bearer-token).
CIDRs prefixed with '-' are treated as allow rules that exempt matching
IPs from blocking. Allow rules take precedence over both
--block_private_networks and custom block CIDRs.
Example: --block_private_networks --block_cidrs -10.0.0.42/32
blocks all private ranges except 10.0.0.42.
Adds 3 new tests for allow-list behavior.
Block outbound HTTP requests to specified IP ranges before TCP handshake
using libcurl CURLOPT_OPENSOCKETFUNCTION callback. Fires after DNS
resolution, reads resolved IP directly from sockaddr, does bitwise CIDR
comparison. Fail-closed: unknown address families are blocked.
--block_private_networks blocks RFC1918, localhost, link-local, ULA.
--block_cidrs blocks additional comma-separated CIDRs.
IPv4-mapped IPv6 (::ffff:x.x.x.x) is unwrapped to prevent bypass.