This patch changes the query used by
`SqliteEventCacheStore::load_all_chunks_metadata`. It was the cause of
severe slowness. The new query improves the throughput by +1140% and the
time by -91.916%. The benchmark will follow in the next patch.
Metrics for 10'000 events (with 1 gap every 80 events).
- Before:
- throughput: 20.686 Kelem/s,
- time: 483.43 ms.
- After:
- throughput: 253.52 Kelem/s,
- time: 39.478 ms.
This query will visit all chunks of a linked chunk with ID
`hashed_linked_chunk_id`. For each chunk, it collects its ID
(`ChunkIdentifier`), previous chunk, next chunk, and number of
events (`num_events`). If it's a gap, `num_events` is equal to 0,
otherwise it counts the number of events in `event_chunks` where
`event_chunks.chunk_id = linked_chunks.id`.
Why not using a `(LEFT) JOIN` + `COUNT`? Because for gaps, the entire
`event_chunks` will be traversed every time. It's extremely inefficient.
To speed that up, we could use an `INDEX` but it will consume more
storage space. Finally, traversing an `INDEX` boils down to traverse a
B-tree, which is O(log n), whilst this `CASE` approach is O(1). This
solution is nice trade-off and offers great performance.
In a "soon" future, threads have their own linked chunk. All our code
has been written with the fact that a linked chunk belong to *a room* in
mind, so it needs some biggish update. Fortunately, most of the changes
are mechanical, so they should be rather easy to review.
Part of #4869, namely #5122.
Getting the position when reading an event is no longer required:
- the only use case for reading the position out of the event cache was
when we wanted to replace a redacted item into the linked chunk; now
with save_event(), we can replace it without having to know its
position.
As an extra measure of caution, I've also included the room_id in the
`events` table, next to the event_id, so that looking for an event is
still restricted to a single room.
This patch is twofold. First off, it provides a new schema allowing to
improve the performance of `SqliteEventCacheStore` for 100_000 events
from 6.7k events/sec to 284k events/sec on my machine.
Second, it now assumes that `EventCacheStore` does NOT store invalid
events. It was already the case, but the SQLite schema was not rejecting
invalid event in case some were handled. It's now explicitely forbidden.
This patch adds an index on `events.event_id` and on `events.room_id`
so that queries on this column are faster. It mostly happens for the
`Deduplicator`, which runs for every backwards pagination or sync.
This patch also updates the query in `filter_duplicated_events` to
sort event by their `chunk_id` and `position` so that the results are
constant, it helps when testing.
Add a new created_at to the send_queue_events and
dependent_send_queue_events stored records. This will allow clients to
understand how stale a pending message might be in the event that the
queue encounters and error and becomes wedged.
This change is exposed through the FFI on the `EventTimelineItem` struct
as a new optional field named `local_created_at`. It will be `None` for
any Remote event, and `Some` for Local events (except for those that
were enqueued before the migrations were run).
Signed-off-by: Daniel Salinas
---------
Signed-off-by: Daniel Salinas <zzorba@users.noreply.github.com>
Co-authored-by: Daniel Salinas <danielsalinas@daniels-mbp-2.myfiosgateway.com>
Co-authored-by: Benjamin Bouvier <benjamin@bouvier.cc>
Co-authored-by: Daniel Salinas <danielsalinas@Daniels-MBP-2.attlocal.net>
Prior to this patch, the send queue would not maintain the ordering of
sending a media *then* a text, because it would push back a dependent
request graduating into a queued request.
The solution implemented here consists in adding a new priority column
to the send queue, defaulting to 0 for existing events, and use higher
priorities for the media uploads, so they're considered before other
requests.
A high priority is also used for aggregation events that are sent late,
so they're sent as soon as possible, before other subsequent events.
Changelog: The send queue will now store a serialized
`QueuedRequestKind` instead of a raw event, which breaks the format.
As a result, all send queues have been emptied.
This makes it possible to have different kinds of *parent key*, to
update a dependent request. A dependent request waits for the parent key
to be set, before it can be acted upon; before, it could only be an
event id, because a dependent request would only wait for an event to be
sent. In a soon future, we're going to support uploading medias as
requests, and some subsequent requests will depend on this, but won't be
able to rely on an event id (since an upload doesn't return an
event/event id).
Since this changes the format of `DependentQueuedRequest`, which is
directly serialized into the state stores, I've also cleared the table,
to not have to migrate the data in there. Dependent requests are
supposed to be transient anyways, so it would be a bug if they were many
of them in the queue.
Since a migration was needed anyways, I've also removed the `rename`
annotations (that supported a previous format) for the
`DependentQueuedRequestKind` enum.
Changelog: Renamed all the send-queue related "events" to "requests", so
as to generalize usage of the send queue to not-events (e.g. medias,
redactions, etc.).
Modify the SendQueue in order to persist the error that cause the event
to fail to send as a `QueueWedgeError`. The `QueueWedgeError` is not a
1:1 mapping for all kinds of errors, but holds variant and information
that the client can react to in order to propose "quick fixes"/solution
before retrying to send.
Fixes https://github.com/matrix-org/matrix-rust-sdk/issues/3973
Also fixes https://github.com/element-hq/element-x-ios/issues/3287
because when a timeline reset occurs the fail to send reason is also
lost.
This PR starts with a refactoring commit
e7696003e8
to introduce the new `QueueWedgedError` and move the logic that was in
the ffi layer to convert api errors to SendState error variant. This
`QueueWedgedError` can be directly use in the `SendingFailed` variant
and expose to ffi.
Second commit
109c133746
adds the persistence, `QueuedEvent` now have an optional error field
instead of a `is_weged` boolean. Same for LocalEchoContent::Event.
Adds also Migration for sqlite and indexeddb
Co-authored-by: Benjamin Bouvier <benjamin@bouvier.cc>
Changelog: We now persist the error that caused an event to fail to send. The error `QueueWedgeError` contains info that client can use to try to resolve the problem when the error is not automatically retry-able. Some breaking changes occurred in the FFI layer for `timeline::EventSendState`, `SendingFailed` now directly contains the wedge reason enum; use it in place of the removed variant of `EventSendState`.
Allows to save media in a different path than the state store.
This adds a "last_access" field to the SQLite implementation, to prepare
for future work on a media retention policy.
This removes the IndexedDB media cache implementation, because as far as
I know it is currently unused, and I have no idea how to implement
efficiently the planned media retention policy with a key-value store.
Closes#1810.
- [x] Public API changes documented in changelogs (optional)
---------
Signed-off-by: Kévin Commaille <zecakeh@tedomum.fr>
It turns out that so as to be able to read the room ids, they need to be
*values*, not only *keys* (since keys are one-way hashed). This means we
need to duplicate the room_id field in indexeddb/sqlite, so each entry
contains both the room_id as a key (for queries) and as a value (to
return it).
Since there's no meaningful migration we can apply, the way to go is to
drop the pending events table and recreate it from the ground up. It is
assumed that no one has used the store on indexeddb; otherwise the
workaround would be to drop and recreate it.
Up until now, users had to listen for to-device events to check for
secrets that were received as an `m.secret.send` event.
This has a bunch of shortcomings:
1. Once the has been given to the consumer, it's gone and can't be
retrieved anymore. Secrets may get lost if an app restart happens
before the consumer decides what to do with it.
2. The consumer can't be sure if the event was received in a
secure manner.
This commit ads a inbox for our received secrets where we will long-term
store all secrets we receive until the user decides to delete them.
It's deemed fine to store all secrets, since we only accept secrets we
have requested and if they have been received from a verified device of
ours.
This implements a new time lease based lock for the `CryptoStore`, that doesn't require explicit unlocking, so that's more robust in the context of #1928, where any process may die because the device is running out of battery, or unexpected flows cause a lock to not be released properly in one or the other process.
```
//! This is a per-process lock that may be used only for very specific use
//! cases, where multiple processes might concurrently write to the same
//! database at the same time; this would invalidate crypto store caches, so
//! that should be done mindfully. Such a lock can be acquired multiple times by
//! the same process, and it remains active as long as there's at least one user
//! in a given process.
//!
//! The lock is implemented using time-based leases to values inserted in a
//! crypto store. The store maintains the lock identifier (key), who's the
//! current holder (value), and an expiration timestamp on the side; see also
//! `CryptoStore::try_take_leased_lock` for more details.
//!
//! The lock is initially acquired for a certain period of time (namely, the
//! duration of a lease, aka `LEASE_DURATION_MS`), and then a "heartbeat" task
//! renews the lease to extend its duration, every so often (namely, every
//! `EXTEND_LEASE_EVERY_MS`). Since the tokio scheduler might be busy, the
//! extension request should happen way more frequently than the duration of a
//! lease, in case a deadline is missed. The current values have been chosen to
//! reflect that, with a ratio of 1:10 as of 2023-06-23.
//!
//! Releasing the lock happens naturally, by not renewing a lease. It happens
//! automatically after the duration of the last lease, at most.
```
---
* feat: implement a time lease based lock for the crypto store
* feat: switch the crypto-store lock a time-leased based one
* chore: fix CI, don't use unixepoch in sqlite and do time math in rust
* chore: dummy implementation in indexeddb, don't run lease locks tests there
* feat: in NSE, wait the duration of a lease if first attempt to unlock failed
* feat: immediately release the lock when there are no more holders
* chore: clippy
* chore: add comment about atomic sanity
* chore: increase sleeps in timeline queue tests?
* feat: lower lease and renew durations
* feat: keep track of the extend-lease task
* fix: increment num_holders when acquiring the lock for the first time
* chore: reduce indent + abort prev renew task on non-wasm + add logs
Since stripped and non-stripped room infos use the same types,
the separation is not necessary anymore.
Signed-off-by: Kévin Commaille <zecakeh@tedomum.fr>
The format of the outbound group session struct has changed. We nowadays correctly rotate the group session if we can't restore it, but it's still good to avoid logging the error in this case.
The SQLite crypto store uses rmp_serde to serialize all the data we're
going to store. This works nicely for most things, one exception to this
is the OutboundGroupSession type.
The OutboundGroupSession type stores to-device requests to ensure that
the session doesn't get used before it is shared with the whole group
and to ensure that the to-device requests get restored if the
session gets restored after an application restart.
The to-device requests type critically contain `Raw<AnyToDeviceEvent>`,
the `Raw` type here being the serde_json::Raw type. rmp_serde seems to
serialize this just fine, but later deserialization fails.
We're avoiding the issue by using serde_json to serialize the
OutboundGroupSession.
Note about "Write-Ahead Log" (WAL) mode: The SQLite WAL mode has a
bunch of advantages that are quite nice to have:
1. WAL is significantly faster in most scenarios.
2. WAL provides more concurrency as readers do not block writers and a
writer does not block readers. Reading and writing can proceed
concurrently.
3. Disk I/O operations tends to be more sequential using WAL.
4. WAL uses many fewer fsync() operations and is thus less vulnerable
to problems on systems where the fsync() system call is broken.
The downsides of WAL mode don't really affect us. So let's turn it on.
More info: https://www.sqlite.org/wal.html
Co-authored-by: Jonas Platte <jplatte@matrix.org>
Co-authored-by: Damir Jelić <poljar@termina.org.uk>