In order for an API to be supported by workers, their dependency on *Page has
to be removed. To keep this PR smaller, we're only converting a minimum number
of APIs from Page to Execution. All other APIs should not be exposed to the
worker (better to get a FormData undefined than to try to segfault trying to
execute FormData without a page).
Turns out you can embed multiple contexts within a snapshot. So our snapshot
now contains the Page context (as before) but also the Worker context. This
gives us the performance benefit of snapshots and makes context creation for
pages and workers much more similar.
We'll have two types of contexts: one for pages and one for workers. They'll
[probably] both be js.Context, but they'll have distinct FunctionTemplates
attached to their global. The Worker template will only contain a small subset
of the main Page's types, along with 1 or 2 of its own specific ones.
The Snapshot now creates the templates for both, so that the Env contains the
function templates for both contexts. Furthermore, having a "merged" view like
this ensures that the env.template[N] indices are consistent between the two.
However, the snapshot only attaches the Page-specific types to the snapshot
context. This allows the Page-context to be created as-is (e.g. efficiently).
The worker context will be created lazily, on demand, but from the templates
loaded into the env (since, again, the env contains templates for both).
A Worker has no page. So any API that is accessible to a worker cannot take
a *Page parameter. Such APIs will now take a js.Execution which the context
will own and create from the Page (or from the WorkerGlobalScope when that's
created).
To test this, in addition to introducing the Execution, this change also updates
URLSearchParams which is accessible to Worker (and the Page obviously). This
change is obviously viral..if URLSearchParams no longer has a *Page but instead
has an *Execution, then any function it calls must also be updated.
So some APIs will take a *Page (those only accessible from a Page) and some will
take an *Execution (those accessible from a Page or Worker). I'm ok with that.
A lot of private/internal functions take a *Page, because it's simple, but all
they want is a call_arena or something. We'll try to update those as much as
possible. The Page/Execution being injected from the bridge is convenient, but
we should be more specific for internal calls and pass only what's needed.
This introduces two slightly related changes.
My understanding is:
- frameId represents the page. Even if the page navigates, it's the same
frameId. We capture this in Page._frame_id. Nothing here changes.
- loaderId is essentially for a specific document of the page. If the page
navigates, it should be a different loaderId. We were using a distinct
loaderId per request. Not sure what problems that caused. But it was wrong.
This was achieved by exposing Page.id to CDP.
- requestId was mostly correct: unique per request. HOWEVER, for the original
document, apparently, requestId == loaderId. This change is particularly
important for various puppeteer and playwrightb behavior. This is a bit
hacked. CDP will look at the resource_type, if it's .document, it'll return
the loaderId, else it returns the requestId it always id.
When we stop streaming, we need to use any previously streamed data as part of
the last "unstreamed" chunk. Or, put differently, when stream is false, that
merely stops any subsequent streams, it doesn't discard any previously streamed
data.
Websockets client can send a Protocol which the server can agree to. This isn't
as fancy as it sounds. We just send a specific header on websocket handshake
and then read the response header.