Internal Sentry reports tell us that enabling the busy_timeout seems to
have *increased* the number of "database is busy" errors, instead of
lowering those. As a result, we're going to disable the pragmas in all
the places where we enabled it before, and observe how the number of
"database is busy" errors evolves.
In a "soon" future, threads have their own linked chunk. All our code
has been written with the fact that a linked chunk belong to *a room* in
mind, so it needs some biggish update. Fortunately, most of the changes
are mechanical, so they should be rather easy to review.
Part of #4869, namely #5122.
Under some very particular circumstances, the "database is locked" error
can still happen, even in WAL mode, even if the database connection is
not being upgraded from a read transaction to a write transaction.
We *think* this might be the reason behind errors like
github.com/element-hq/element-x-ios/issues/3582, so we're enabling the
sqlite busy_timeout, which will retry the operation after a short sleep,
until the busy timeout is being hit.
This patch adds the `--generate-link-to-definition`
argument to `rustdoc` for `docs.rs`. This is using
https://github.com/rust-lang/rust/pull/84176 to add links in the source
code page.
- Doc comment for the SQLite-based state store incorrectly referred to
it as a "cryptostore".
- Consistent capitalisation of SQLite.
- Consistent use of indefinite article "an" before SQLite.
- Fix line length.
Imagine we have the following events:
| event_id | room_id | chunk_id | position |
|----------|---------|----------|----------|
| $ev0 | !r0 | 42 | 0 |
| $ev1 | !r0 | 42 | 1 |
| $ev2 | !r0 | 42 | 2 |
| $ev3 | !r0 | 42 | 3 |
| $ev4 | !r0 | 42 | 4 |
`$ev2` has been removed, then we end up in this state:
| event_id | room_id | chunk_id | position |
|----------|---------|----------|----------|
| $ev0 | !r0 | 42 | 0 |
| $ev1 | !r0 | 42 | 1 |
| | | | | <- no more `$ev2`
| $ev3 | !r0 | 42 | 3 |
| $ev4 | !r0 | 42 | 4 |
We need to shift the `position` of `$ev3` and `$ev4` to `position - 1`,
like so:
| event_id | room_id | chunk_id | position |
|----------|---------|----------|----------|
| $ev0 | !r0 | 42 | 0 |
| $ev1 | !r0 | 42 | 1 |
| $ev3 | !r0 | 42 | 2 |
| $ev4 | !r0 | 42 | 3 |
Usually, it boils down to run the following query:
```sql
UPDATE event_chunks
SET position = position - 1
WHERE position > 2 AND …
```
Okay. But `UPDATE` runs on rows in no particular order. It means that
it can update `$ev4` before `$ev3` for example. What happens in this
particular case? The `position` of `$ev4` becomes `3`, however `$ev3`
already has `position = 3`. Because there is a `UNIQUE` constraint
on `(room_id, chunk_id, position)`, it will result in a constraint
violation.
There is **no way** to control the execution order of `UPDATE` in
SQLite. To persuade yourself, try:
```sql
UPDATE event_chunks
SET position = position - 1
FROM (
SELECT event_id
FROM event_chunks
WHERE position > 2 AND …
ORDER BY position ASC
) as ordered
WHERE event_chunks.event_id = ordered.event_id
```
It will fail the same way.
Thus, we have 2 solutions:
1. Remove the `UNIQUE` constraint,
2. Be creative.
The `UNIQUE` constraint is a safe belt. Normally, we have
`event_cache::Deduplicator` that is responsible to ensure there is no
duplicated event. However, relying on this is “fragile” in the sense it
can contain bugs. Relying on the `UNIQUE` constraint from SQLite is more
robust. It's “braces and belt” as we say here.
So. We need to be creative.
Many solutions exist. Amongst the most popular, we see _dropping and
re-creating the index_, which is no-go for us, it's too expensive. I
(@hywan) have adopted the following one:
- Do `position = position - 1` but in the negative space, so
`position = -(position - 1)`. A position cannot be negative; we are
sure it is unique!
- Once all candidate rows are updated, do `position = -position` to move
back to the positive space.
'told you it's gonna be creative.
This solution is a hack, **but** it is a small number of operations, and
we can keep the `UNIQUE` constraint in place.
This patch updates the `test_linked_chunk_remove_item` to handle
6 events. On _my_ system, with _my_ SQLite version, it triggers the
`UNIQUE` constraint violation without the bug fix.
This patch adds a new constructor for `SqliteStoreConfig`, which sets
some defaults tailored for low memory usage.
This patch adds tests asserting the defaults for `new` and
`with_low_memory_config`.
Getting the position when reading an event is no longer required:
- the only use case for reading the position out of the event cache was
when we wanted to replace a redacted item into the linked chunk; now
with save_event(), we can replace it without having to know its
position.
As an extra measure of caution, I've also included the room_id in the
`events` table, next to the event_id, so that looking for an event is
still restricted to a single room.
This patch updates `BaseStateStore` and the `StateStore` trait along
with its implementors, to return all rooms or a single room from
`StateStore::get_room_infos`.
See the previous patch for more context.
This patch updates `StoreOpenConfig` to hold a new type: `RuntimeConfig`.
This `RuntimeConfig` type is passed to a new `SqliteAsyncConnExt`
method, named `apply_runtime_config`. Depending on the values passed
here, the `optimize`, `cache_size` (new!) and `journal_size_limit`
methods will be called automatically.
The goal of this type is to automate a flow we keep repeating in
all the stores. This is error-prone. This type brings uniformity and
consistency.
This patch also makes all `open_with_pool` methods on the stores private
(they were public before):
1. they were never used as far as I know because getting a `SqlitePool`
isn't possible since the `pool` attribute is private…
2. it's better to keep control of this flow.
This patch adds a new `StoreOpenConfing` type to configure the store
when opening it and when creating the pool of connections to SQLite via
`deadpool_sqlite`.
This patch also adds a new `open_with_config` constructor on all
stores, namely `SqliteCryptoStore`, `SqliteEventCacheStore` and
`SqliteStateStore`.
As opposed to WAL mode, foreign keys must be enabled for each database
connection, according to
https://www.sqlite.org/foreignkeys.html#fk_enable
Unfortunately, we can't track which connection objects have already
executed the pragma, so the safer we can do is enable it everytime we
try to acquire a connection from the pool.
Fixes#4785.
This patch is twofold. First off, it provides a new schema allowing to
improve the performance of `SqliteEventCacheStore` for 100_000 events
from 6.7k events/sec to 284k events/sec on my machine.
Second, it now assumes that `EventCacheStore` does NOT store invalid
events. It was already the case, but the SQLite schema was not rejecting
invalid event in case some were handled. It's now explicitely forbidden.