This was quite handy during development of the client-side computation of read-receipts, to analyze some bugs.
It might be not useful to have it checked in for long, but I would love to make sure it keeps on compiling
until we have a more stable handling of read receipts in general.
This implements a value-based lock in the crypto stores. The intent is to use that for multiple processes to be able to make writes into the store concurrently, while still cooperating on who does them. In particular, we need this for #1928, since we may have up to two different processes trying to write into the crypto store at the same time.
## New methods in the `CryptoStore` trait
The idea is to introduce two new methods touching **custom values** in the crypto store:
- one to atomically insert a value, only if it was missing (so, not following the semantics of `upsert` used in the `set_custom_value`)
- one to atomically remove a custom value
Those two operations match the semantics we want:
- take the lock only if it ain't taken already == insert an entry only if it was missing
- release the lock = remove the entry
By looking at the number of lines affected by the query, we can infer whether the insert/remove happened or not, that is, if we managed to take the lock or not.
## High-level APIs
I've also added an high-level API, `CryptoStoreLock`, that helps managing such a lock, and adds some niceties on top of that:
- exponential backoff to retry attempts at acquiring the lock, when it was already taken
- attempt to gracefully recover when the lock has been taken by an app that's been killed by the environment
- full configuration of the key / value / backoff parameters
While it'd be nice to have something like a `CryptoStoreLockGuard`, it's hard to implement without being racy, because of the `async` statements that would happen in the `Drop` method (and async drop isn't stable yet).
## Test program
There's also a test program in which I shamelessly show my rudimentary unix skills; I've put it in the `labs/` directory but this could as well be a large integration test. A parent program initially fills a custom crypto store, then creates a `pipe()` for 1-way communication with a child created with `fork()`; then the parent sends commands to the child. These commands consist in reading and writing into the crypto store, using a lock. And while the child attempts to perform these operations, the parent tries hard to get the lock at the same time. This helps figuring out a few issues and making sure that cross-process locking would work as intended.
This patch should ideally be split into multiple smaller ones, but life
is life.
This main purpose of this patch is to fix and to test
`SlidingSyncListRequestGenerator`. This quest has led me to rename
mutiple fields in `SlidingSyncList` and `SlidingSyncListBuilder`, like:
* `rooms_count` becomes `maximum_number_of_rooms`, it's not something
the _client_ counts, but it's a maximum number given by the server,
* `batch_size` becomes `full_sync_batch_size`, so that now, it
emphasizes that it's about full-sync only,
* `limit` becomes `full_sync_maximum_number_of_rooms_to_fetch`, so that
now, it also emphasizes that it's about ful-sync only _and_ what the
limit is about!
This quest has continued with the renaming of the `SlidingSyncMode`
variants. After a discussion with the ElementX team, we've agreed on the
following renamings:
* `Cold` becomes `NotLoaded`,
* `Preload` becomes `Preloaded`,
* `CatchingUp` becomes `PartiallyLoaded`,
* `Live` becomes `FullyLoaded`.
Finally, _le plat de résistance_.
In `SlidingSyncListRequestGenerator`, the `make_request_for_ranges`
has been renamed to `build_request` and no longer takes a `&mut self`
but a simpler `&self`! It didn't make sense to me that something
that make/build a request was modifying `Self`. Because the type of
`SlidingSyncListRequestGenerator::ranges` has changed, all ranges now
have a consistent type (within this module at least). Consequently, this
method no longer need to do a type conversion.
Still on the same type, the `update_state` method is much more
documented, and errors on range bounds (offset by 1) are now all fixed.
The creation of new ranges happens in a new dedicated pure function,
`create_range`. It returns an `Option` because it's possible to not be
able to compute a range (previously, invalid ranges were considered
valid). It's used in the `Iterator` implementation. This `Iterator`
implementation contains a liiiittle bit more code, but at least now
we understand what it does, and it's clear what `range_start` and
`desired_size` we calculate. By the way, the `prefetch_request` method
has been removed: it's not a prefetch, it's a regular request; it was
calculating the range. But now there is `create_range`, and since it's
pure, we can unit test it!
_Pour le dessert_, this patch adds multiple tests. It is now
possible because of the previous refactoring. First off, we test the
`create_range` in many configurations. It's pretty clear to understand,
and since it's core to `SlidingSyncListRequestGenerator`, I'm pretty
happy with how it ends. Second, we test paging-, growing- and selective-
mode with a new macro: `assert_request_and_response`, which allows to
“send” requests, and to “receive” responses. The design of `SlidingSync`
allows to mimic requests and responses, that's great. We don't really
care about the responses here, but we care about the requests' `ranges`,
and the `SlidingSyncList.state` after a response is received. It also
helps to see how ranges behaves when the state is `PartiallyLoaded`
or `FullyLoaded`.
Add a second full-sync-up mode to sliding sync. Previously - and still the default for backwards compatibility, but now named `PagingFullSync` - was to page through the list by the page-size, de-validating the previous page of items. The newly added `GrowingFullSync` instead grows the window by the given number `batch_size` per request, starting and keeping it from `0`. In the latter we might be pushing more data over the connection and are slightly slower, but the top always stays active and thus reactive to changes.
Furthermore the developer can now configure an optional maximum number to grow/paginate the full-sync up to (so stopping before actually having reached `count` whatever is smaller). This is already exposed via FFI, too.
- [x] add new full-sync-mode
- [x] add limited sync-up mode (similar to full sync but only to a limit `n`)
- [x] implement sync-up in jack-in
With these changes, the user of sliding sync can configure the SlidingSync-Builder to store and recover the state from storage. For that they should call cold_cache("my-sliding-sync-key") with their preferred storage location during setup time on SlidingSyncBuilder. If a cached version is found under that key, the internal state of the rooms-list as well as each view found (identified by their name) will be set from storage immediately - allowing the user to show the last cached state. The Builder then also uses the same key to store its latest state after every update received from remote.
👉 Note on room list:
As we store a disconnected state, we are saving all RoomList entries as either Empty or Invalidated. This allows for an easier updating when the server comes back with results, as we don't track the ranges in cache - in the view of the server, we haven't seen anything yet.
👉 Note on timeline events:
This does store all existing timeline events - initial and seen during the session (as we keep track of them right now) - and will recover that state. However, as we can't be sure wether we have gaps in the timeline, the timeline items are reset upon receiving updates for them from the server.
👉 Note on (full-sync-)view:
While we are caching the server returned results (view list, room count and room info) as is, we are not caching the internal state (whether we are catching up) nor the ranges (as they are likely to be out of date) - those will be acting the same way you configured them, just with preloaded results. So for a full-sync-view, it will still do requests in batches and replace the corresponding set of rooms. Which could mean you might see the same room appear twice, though the cached one would be marked as invalidated. This might be problematic if the list the UI shows is longer than the batches fetched, but would be resolved quickly when catching up.
👉 On recovery:
When the sliding sync receives a M_UNKNOWN_POS, indicating the server has expired our session, sliding sync now transparently retries with up to three times to restart the sliding sync with full set of extensions and the latest views at their existing windows, the current room state is held. For full sync this starts a new sync-up with the existing room list staying intact. This also works from the offline at start.