In the previous situation, we had two locks with similar responsibilities, the `response_handling_lock`
and the `position` lock. The latter *almost* covered the former's critical zone, albeit for a single
function call, which left room for a deadlock situation (latter taken, then former, then latter).
This removes the former, and extends the critical zone of the latter up to the end of the response handling,
removing the possibility of the deadlock entirely.
This reverts commit 9b811009e1.
We have realized that the server might not handle timeouts as expected.
Thus, it's hard to know the difference between network timeout and poll
timeout (resp. the server is unreachable vs. the server has nothing to
respond with). We will come back on this later.
As a sequel of the previous commit (d4cbcd397d), this patch updates
`SlidingSync` to “ignore” timeouts, i.e. timeouts aren't reported to the
caller, and aren't stopping the sync-loop.
All errors inside `SlidingSync` are stopping the sync-loop, and errors
are returned to the caller. However, in some situation, some errors
should be ignored, i.e. they should not stop the sync-loop and they
should not be returned to the caller: the sync-loop just continues to
run. This patch does that for `Error::ResponseAlreadyReceived`. More
errors will come.
Why is it annoying? When `matrix_sdk_ui::SyncService` sees an error,
it stops all the sync-loops (`RoomListService`, `EncryptionSync`…) and
restarts them properly. In the case of `Error::ResponseAlreadyReceived`,
this is a waste of time and resources. This error is an error from the
`SlidingSync` point of view, but _not_ from the caller point of view.
This patch updates the `BaseClient::process_sliding_sync` method to put
the `LeftRoom` in the `SyncResponse::leave` list.
To achieve so, this patch updates
`BaseClient::process_sliding_sync_room` to returns `Option<JoinedRoom>`,
`Option<LeftRoom>` and `Option<InvitedRoom>` (previously, it was only
`JoinedRoom` and `Option<InvitedRoom>`).
SlidingSync emits state events inside `required_state`, but _also_
sometimes inside `timeline`! This patch looks for state events inside
both entries.
The rest of the patch is composed of tests.
Imagine the following scenario:
A request $R_1$ is sent. A response $S_1$ is received and is being
handled. In the meantime, the sync-loop is instructed to skip over any
remaining work in its iteration and to jump to the next iteration. As a
consequence, $S_1$ is detached, but continues to run. In the meantime, a
new request $R_2$ starts. Since $S_1$ has _not_ finished to be handled,
the `pos` isn't updated yet, and $R_2$ starts with the _same_ `pos`
as $R_1$.
The impacts are the following:
1. Since the `pos` is the same, even if some parameters are different,
the server will reply with the same response. It's a waste of time
and resources (incl. network).
2. Receiving the same response could have corrupt the state. It has been
fixed in https://github.com/matrix-org/matrix-rust-sdk/pull/2395
though.
Point 2 has been addressed, but point 1 remains to be addresed. This
patch fixes point 1.
How? It changes the `RwLock` around `SlidingSyncInner::position` to
a `Mutex`. An `OwnedMutexGuard` is fetched by locking the mutex when
the request is generated (i.e. when `pos` is read to be put in the new
request). This `OwnedMutexGuard` is kept during the entire lifetime
of the request extend to the response handling. It is dropped/released
when the response has been fully handled, or if any error happens along
the process.
It means that it's impossible for a new request to be generated and to
be sent if a request and response is running. It solves point 1 in case
of successful response, otherwise the `pos` isn't updated because of
an error.
This patch first off renames `Client::deserialize_events` as
`Client::deserialize_state_events`. Then, this patch initially updated
the return type of this method from `Vec<Option<_>>` to `Vec<_>` to
filter out events that failed to deserialize. The patch ultimately
updates the return type of this method to `Vec<(Raw<_>, _)>`, so that
the raw state events that map to the state event are collected too (this
is required for making `Client::handle_state` to work correctly if an
event failed to deserialized). Finally, the rest of the patch updates
the code accordingly.
`matrix_sdk_ui::room_list_service::Error` has a `SlidingSync` variant to
report errors from `matrix_sdk::sliding_sync::Error` (to be more exact,
from `matrix_sdk::Error::SlidingSync`).
This patch updates the error message from `SlidingSync failed` to
`SlidingSync failed: <explanation>`, where `<explanation>` is the
`Display` representation of the inner error.
It will help to provide more details to the end-user when looking at
logs.
The Room::request_encryption_state method employs a DashMap to uphold a
lock, facilitating the de-duplication of requests directed to the server.
The de-duplication logic involves creating a fresh Mutex and embedding
it into the DashMap through these stages:
1. Generate a new de-duplication mutex.
2. Insert a mutex copy into the DashMap.
2. Acquire the mutex.
Due to DashMap's limitation of enabling map locking solely during
insertion (step 1), a race condition emerges. Consequently, multiple
invocations of the Room::request_encryption_state method might
concurrently endeavor to insert the de-duplication mutex.
This commit removes the chance of such a race. It substitutes the
DashMap with a combination of a mutex and a BTreeMap. This adaptation
permits us to lock the map throughout all three specified operations.
This commit addresses a bug in the Room::is_encrypted functionality.
The issue stems from the Room::request_encryption_state method, which
Room::is_encrypted relies on. This method incorporates a de-duplication
mechanism to ensure that only a single request for the m.room.encryption
state event is made to the server.
Due to this de-duplication mechanism, Room::request_encryption_state
follows two code paths. One path receives the state event from the
server, while the other path merely waits for the first path to complete.
The primary goal of the Room::request_encryption_state method is to
furnish the requested state event. However, because the second code path
doesn't receive any content, it returns an Option.
The problem arises when Room::is_encrypted evaluates this Option. It
erroneously determines that if the Option is None, encryption remains
inactive for the room.
To rectify this, the commit proposes that Room::is_encrypted analyze the
information stored in the in-memory RoomInfo, which is maintained by the
Room::request_encryption_state method. This approach mirrors the existing
behavior of Room::is_encrypted when it processes the code path where the
m.room.encryption state event has already been retrieved.