This patch updates `SlidingSync.pos` and `.delta_token`. Each field has their
own lock. It's annoying because they must updated at the same time to not be
out-of-sync.
So a new field `SlidingSync.position` of type `SlidingSyncPositionMarkers` is
created. It holds `pos` and `delta_token. This new field is behind a single
`RwLock`.
The comments in the code should explain the general approach, but to give an overview:
We assign a unique sequence number to each device-list-invalidation notice, and keep track
of the sequence number at which each user was most recently invalidated. Then, when we
build a `/keys/query` request, we take note of the current sequence number. Finally, when
we get the response from that `/keys/query` request, we compare that sequence number
with the most-recent-invalidation of each user that is now supposedly up-to-date. All
of that means that we can see if the user has been invalidated *since* we built the
`/keys/query` request, in which case our request may have raced against the new device,
so we do not mark the user as up-to-date.
To make that work, we need to keep track of the sequence number for in-flight `/keys/query`
requests; we do that in the `IdentityManager`. Since there should only ever be at most one
batch of requests in flight, that is relatively easy; however, we also record the request ids
just to check that the application isn't doing something weird.
There is no need to persist the sequence numbers to permanent storage: provided the
`IdentityManager` and `Store` objects have a lifetime at least as long as an in-flight
`/keys/query` request, it is sufficient just to retain the `dirty` bit in permanent storage,
and then, when we reload the ephemeral `Store` cache, to treat everything with a `dirty`
bit as a "new" invalidation.
Fixes: #1386.
With https://github.com/matrix-org/matrix-rust-sdk/pull/1601,
`TaskHandle::with_callback` is no longer necessary.
This patch updates `TaskHandle` as follows:
1. Before, the `handle` and `callback` fields were not mutually exclusive!
Indeed, the `callback` field was used as an abort mechanism, but _also_ as a
finalizer, i.e. a code that runs _after_ the cancellation of `handle`. Now
that the callback-based solution is no longer used, we can keep one usage for
this field and rename it `finalizer`.
2. The `handle` field is no longer optional, as it will always be set.
3. The `with_callback` method is renamed `new` as it's now the only constructor.
4. The `set_callback` method is renamed `set_finalizer`.
Tadaaa.
Move all `SlidingSync`'s fields into a new `SlidingSyncInner` type, and then
update `SlidingSync` to contain a single `inner: Arc<SlidingSyncInner>` field.
First off, it's much simpler to understand what's behind an `Arc` for
“clonability” of `SlidingSync`, and what's not. We don't need to worry about
adding a new field that is not behind an `Arc` for a specific reason or
anything. The pattern is clear now.
Second, this is required for next commits.
Before this patch, `SlidingSync::sync` was returning a callback-based
`TaskHandle`. It was waiting on the “stream loop” to finish (since it's a long-
poll, it means waiting 2*30s, cf. our code), before checking an atomic flag has
some value to decide whether it was time to leave the loop or not. So when the
user is cancelling this `TaskHandle`, a response (if any) was always handled.
But in the meantime, it was possible to start a new `sync`, and it seems like
it creates bugs.
After this patch, `SlidingSync::sync` now returns a handle-based `TaskHandle`.
It means that cancelling it will cancel the “stream loop” immediately. If
no response was in-flight from the server, that's perfect, no problem. If a
response was in-flight, the inner `pos` of the `SlidingSync` instance won't be
updated as the response won't be handled. So the server will re-send the same
response with the next sync request.
I guess it's better this way. Thoughts?
This patch adds a new feature.
Each call to `SlidingSync::stream` generates a unique `stream_id`. This
`stream_id` is passed to requests as a `txn_id` field. Returned responses must
hold the same `txn_id`, otherwise it means `stream` has received unexpected
responses from another `stream` or something else.
So far, when the `txn_id` field of the response and the `stream_id` don't
match, a log with the `ERROR` level is raised. Should we do more, like ignoring
the response entirely? Not sure yet.