This patch should ideally be split into multiple smaller ones, but life
is life.
This main purpose of this patch is to fix and to test
`SlidingSyncListRequestGenerator`. This quest has led me to rename
mutiple fields in `SlidingSyncList` and `SlidingSyncListBuilder`, like:
* `rooms_count` becomes `maximum_number_of_rooms`, it's not something
the _client_ counts, but it's a maximum number given by the server,
* `batch_size` becomes `full_sync_batch_size`, so that now, it
emphasizes that it's about full-sync only,
* `limit` becomes `full_sync_maximum_number_of_rooms_to_fetch`, so that
now, it also emphasizes that it's about ful-sync only _and_ what the
limit is about!
This quest has continued with the renaming of the `SlidingSyncMode`
variants. After a discussion with the ElementX team, we've agreed on the
following renamings:
* `Cold` becomes `NotLoaded`,
* `Preload` becomes `Preloaded`,
* `CatchingUp` becomes `PartiallyLoaded`,
* `Live` becomes `FullyLoaded`.
Finally, _le plat de résistance_.
In `SlidingSyncListRequestGenerator`, the `make_request_for_ranges`
has been renamed to `build_request` and no longer takes a `&mut self`
but a simpler `&self`! It didn't make sense to me that something
that make/build a request was modifying `Self`. Because the type of
`SlidingSyncListRequestGenerator::ranges` has changed, all ranges now
have a consistent type (within this module at least). Consequently, this
method no longer need to do a type conversion.
Still on the same type, the `update_state` method is much more
documented, and errors on range bounds (offset by 1) are now all fixed.
The creation of new ranges happens in a new dedicated pure function,
`create_range`. It returns an `Option` because it's possible to not be
able to compute a range (previously, invalid ranges were considered
valid). It's used in the `Iterator` implementation. This `Iterator`
implementation contains a liiiittle bit more code, but at least now
we understand what it does, and it's clear what `range_start` and
`desired_size` we calculate. By the way, the `prefetch_request` method
has been removed: it's not a prefetch, it's a regular request; it was
calculating the range. But now there is `create_range`, and since it's
pure, we can unit test it!
_Pour le dessert_, this patch adds multiple tests. It is now
possible because of the previous refactoring. First off, we test the
`create_range` in many configurations. It's pretty clear to understand,
and since it's core to `SlidingSyncListRequestGenerator`, I'm pretty
happy with how it ends. Second, we test paging-, growing- and selective-
mode with a new macro: `assert_request_and_response`, which allows to
“send” requests, and to “receive” responses. The design of `SlidingSync`
allows to mimic requests and responses, that's great. We don't really
care about the responses here, but we care about the requests' `ranges`,
and the `SlidingSyncList.state` after a response is received. It also
helps to see how ranges behaves when the state is `PartiallyLoaded`
or `FullyLoaded`.
This patch updates `SlidingSync.pos` and `.delta_token`. Each field has their
own lock. It's annoying because they must updated at the same time to not be
out-of-sync.
So a new field `SlidingSync.position` of type `SlidingSyncPositionMarkers` is
created. It holds `pos` and `delta_token. This new field is behind a single
`RwLock`.
The comments in the code should explain the general approach, but to give an overview:
We assign a unique sequence number to each device-list-invalidation notice, and keep track
of the sequence number at which each user was most recently invalidated. Then, when we
build a `/keys/query` request, we take note of the current sequence number. Finally, when
we get the response from that `/keys/query` request, we compare that sequence number
with the most-recent-invalidation of each user that is now supposedly up-to-date. All
of that means that we can see if the user has been invalidated *since* we built the
`/keys/query` request, in which case our request may have raced against the new device,
so we do not mark the user as up-to-date.
To make that work, we need to keep track of the sequence number for in-flight `/keys/query`
requests; we do that in the `IdentityManager`. Since there should only ever be at most one
batch of requests in flight, that is relatively easy; however, we also record the request ids
just to check that the application isn't doing something weird.
There is no need to persist the sequence numbers to permanent storage: provided the
`IdentityManager` and `Store` objects have a lifetime at least as long as an in-flight
`/keys/query` request, it is sufficient just to retain the `dirty` bit in permanent storage,
and then, when we reload the ephemeral `Store` cache, to treat everything with a `dirty`
bit as a "new" invalidation.
Fixes: #1386.
With https://github.com/matrix-org/matrix-rust-sdk/pull/1601,
`TaskHandle::with_callback` is no longer necessary.
This patch updates `TaskHandle` as follows:
1. Before, the `handle` and `callback` fields were not mutually exclusive!
Indeed, the `callback` field was used as an abort mechanism, but _also_ as a
finalizer, i.e. a code that runs _after_ the cancellation of `handle`. Now
that the callback-based solution is no longer used, we can keep one usage for
this field and rename it `finalizer`.
2. The `handle` field is no longer optional, as it will always be set.
3. The `with_callback` method is renamed `new` as it's now the only constructor.
4. The `set_callback` method is renamed `set_finalizer`.
Tadaaa.
Move all `SlidingSync`'s fields into a new `SlidingSyncInner` type, and then
update `SlidingSync` to contain a single `inner: Arc<SlidingSyncInner>` field.
First off, it's much simpler to understand what's behind an `Arc` for
“clonability” of `SlidingSync`, and what's not. We don't need to worry about
adding a new field that is not behind an `Arc` for a specific reason or
anything. The pattern is clear now.
Second, this is required for next commits.
Before this patch, `SlidingSync::sync` was returning a callback-based
`TaskHandle`. It was waiting on the “stream loop” to finish (since it's a long-
poll, it means waiting 2*30s, cf. our code), before checking an atomic flag has
some value to decide whether it was time to leave the loop or not. So when the
user is cancelling this `TaskHandle`, a response (if any) was always handled.
But in the meantime, it was possible to start a new `sync`, and it seems like
it creates bugs.
After this patch, `SlidingSync::sync` now returns a handle-based `TaskHandle`.
It means that cancelling it will cancel the “stream loop” immediately. If
no response was in-flight from the server, that's perfect, no problem. If a
response was in-flight, the inner `pos` of the `SlidingSync` instance won't be
updated as the response won't be handled. So the server will re-send the same
response with the next sync request.
I guess it's better this way. Thoughts?