This patch changes the query used by
`SqliteEventCacheStore::load_all_chunks_metadata`. It was the cause of
severe slowness. The new query improves the throughput by +1140% and the
time by -91.916%. The benchmark will follow in the next patch.
Metrics for 10'000 events (with 1 gap every 80 events).
- Before:
- throughput: 20.686 Kelem/s,
- time: 483.43 ms.
- After:
- throughput: 253.52 Kelem/s,
- time: 39.478 ms.
This query will visit all chunks of a linked chunk with ID
`hashed_linked_chunk_id`. For each chunk, it collects its ID
(`ChunkIdentifier`), previous chunk, next chunk, and number of
events (`num_events`). If it's a gap, `num_events` is equal to 0,
otherwise it counts the number of events in `event_chunks` where
`event_chunks.chunk_id = linked_chunks.id`.
Why not using a `(LEFT) JOIN` + `COUNT`? Because for gaps, the entire
`event_chunks` will be traversed every time. It's extremely inefficient.
To speed that up, we could use an `INDEX` but it will consume more
storage space. Finally, traversing an `INDEX` boils down to traverse a
B-tree, which is O(log n), whilst this `CASE` approach is O(1). This
solution is nice trade-off and offers great performance.
In a "soon" future, threads have their own linked chunk. All our code
has been written with the fact that a linked chunk belong to *a room* in
mind, so it needs some biggish update. Fortunately, most of the changes
are mechanical, so they should be rather easy to review.
Part of #4869, namely #5122.
Getting the position when reading an event is no longer required:
- the only use case for reading the position out of the event cache was
when we wanted to replace a redacted item into the linked chunk; now
with save_event(), we can replace it without having to know its
position.
As an extra measure of caution, I've also included the room_id in the
`events` table, next to the event_id, so that looking for an event is
still restricted to a single room.
This patch is twofold. First off, it provides a new schema allowing to
improve the performance of `SqliteEventCacheStore` for 100_000 events
from 6.7k events/sec to 284k events/sec on my machine.
Second, it now assumes that `EventCacheStore` does NOT store invalid
events. It was already the case, but the SQLite schema was not rejecting
invalid event in case some were handled. It's now explicitely forbidden.
This patch adds an index on `events.event_id` and on `events.room_id`
so that queries on this column are faster. It mostly happens for the
`Deduplicator`, which runs for every backwards pagination or sync.
This patch also updates the query in `filter_duplicated_events` to
sort event by their `chunk_id` and `position` so that the results are
constant, it helps when testing.
Allows to save media in a different path than the state store.
This adds a "last_access" field to the SQLite implementation, to prepare
for future work on a media retention policy.
This removes the IndexedDB media cache implementation, because as far as
I know it is currently unused, and I have no idea how to implement
efficiently the planned media retention policy with a key-value store.
Closes#1810.
- [x] Public API changes documented in changelogs (optional)
---------
Signed-off-by: Kévin Commaille <zecakeh@tedomum.fr>