IMHO the logic here was inverted. The only use for the report data is to
show a preview when we ask the user whether they want to participate in
usage reporting. However, the GUI would first load the report data and
then consider whether we wanted to show that dialog or not. Instead,
only load if it we're going to show the dialog.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
This adds a new field to the file information we keep, the "previous
blocks hash". This is the hash of the file contents as it was in its
previous incarnation. That is, every scan that updates the blocks hash
will move the current hash to the "previous" field.
This enables an addition to the conflict detection algorithm: if the
file to be synced is in conflict with the current file on disk
(version-counter wise), but it indicates that it was based on the
precise contents we have (new.prevBlocksHash == current.blocksHash),
then it's not really a conflict.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
This adds a rule where a maintainer can merge a PR on their own even in
non-trivial cases, which we (I) already do as-is today, but by breaking
the rules instead of following them. This just codifies that behavior.
If we get to a point where it's no longer necessary, that'd be cool.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
This changes the files table to use normalisation for the names and
versions. The idea is that these are often common between all remote
devices, and repeating an integer is more efficient than repeating a
long string. A new benchmark bears this out; for a database with 100k
files shared between 31 devices, with some worst case assumption on
version vector size, the database is reduced in size by 50% and the test
finishes quicker:
Current:
db_bench_test.go:322: Total size: 6263.70 MiB
--- PASS: TestBenchmarkSizeManyFilesRemotes (1084.89s)
New:
db_bench_test.go:326: Total size: 3049.95 MiB
--- PASS: TestBenchmarkSizeManyFilesRemotes (776.97s)
The other benchmarks end up about the same within the margin of
variability, with one possible exception being that RemoteNeed seems to
be a little slower on average:
old files/s new files/s
Update/n=RemoteNeed/size=1000-8 5.051k 4.654k
Update/n=RemoteNeed/size=2000-8 5.201k 4.384k
Update/n=RemoteNeed/size=4000-8 4.943k 4.242k
Update/n=RemoteNeed/size=8000-8 5.099k 3.527k
Update/n=RemoteNeed/size=16000-8 3.686k 3.847k
Update/n=RemoteNeed/size=30000-8 4.456k 3.482k
I'm not sure why, possibly that query can be optimised anyhow.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
We have a slightly naive io.ReadAll on the authentication handler, which
can result in unlimited memory consumption from an unauthenticated API
endpoint. Add a reasonable limit there.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
Remove the migrated v0.14.0 database format after two weeks. Remove a
few old patterns that are no longer relevant. Ensure the cleanup runs in
both the config and database directories.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
Store the sequence number of the last GC sweep in a KV. Next time, if it
matches we can just skip GC because nothing has been added or removed.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
There are some return statements in between, but putting back the values
isn't my motivation (hardly ever happens), I just find this more
readable. Same with moving `hashLength`: Placed next to the pool the
connection with `sha256.New()` is closer.
Followup to:
chore(scanner): reduce memory pressure by using pools inside hasher #102226e26fab3a0
Signed-off-by: Simon Frei <freisim93@gmail.com>
Periodic garbage collection can take a long time on large folders. The worst
step is the one for blocks, which are typically orders of magnitude more
numerous than files or block lists.
This improves the situation in by running blocks GC in a number of smaller
range chunks, in random order, and stopping after a time limit. At most ten
minutes per run will be spent garbage collecting blocklists and blocks.
With this, we're not guaranteed to complete a full GC on every run, but
we'll make some progress and get there eventually.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
No migration on this as it has no practical impact, just a slight
cleanup for new installations.
Also a refactor of how we declare single column primary keys, for
consistency.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
This adds several options for configuring the log format of timestamps
and severity levels, making it more suitable for integration with log
systems like systemd.
--log-format-timestamp="2006-01-02 15:04:05"
Format for timestamp, set to empty to disable timestamps ($STLOGFORMATTIMESTAMP)
--[no-]log-format-level-string
Whether to include level string in log line ($STLOGFORMATLEVELSTRING)
--[no-]log-format-level-syslog
Whether to include level as syslog prefix in log line ($STLOGFORMATLEVELSYSLOG)
So, to get a timestamp suitable for systemd (syslog prefix, no level
string, no timestamp) we can pass `--log-format-timestamp=""
--no-log-format-level-string --log-format-level-syslog` or,
equivalently, set `STLOGFORMATTIMESTAMP="" STLOGFORMATLEVELSTRING=false
STLOGFORMATLEVELSYSLOG=true`.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
No practical effect, just a tiny bit of fun to stamp the database files
with an application ID that identifies them.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
While we're figuring out optimal defaults, reduce the page cache size to
the compiled-in default. In my computer this makes no difference in
benchmarks. In forum threads, it solved the problem of massive memory
usage during initial scan.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
* fix(api): redact device encryption passwords in support bundle config
Signed-off-by: Tommy van der Vorst <tommy@pixelspark.nl>
* Update lib/api/support_bundle.go
Signed-off-by: Jakob Borg <jakob@kastelo.net>
---------
Signed-off-by: Tommy van der Vorst <tommy@pixelspark.nl>
Signed-off-by: Jakob Borg <jakob@kastelo.net>
Co-authored-by: Jakob Borg <jakob@kastelo.net>
Since #10332 we'd create the temp file when closing out the puller state
for a file, but this is inappropriate if the reason we're bailing out is
that there isn't space for it to begin with. Instead, do the
free space check before we even start copying/pulling.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
Removes the chitter-chatter of folder state changes from the info level,
while adding the error state at warning level and a corresponding
clearing of the error state at info level.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
This reduces database migration memory usage in my test scenario from
3.8 GB to 440 MB. In principle I don't think we're causing many temp
tables to be generated anyway in normal usage, but if we do and someone
can benchmark a performance difference, we can add a tunable. I ran the
database benchmark before and after and didn't see a difference above
the noise level.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
This copies the relevant parts of the contribution guidelines in the
docs, for the purpose of keeping them in a single place. The in-docs
contribution guidelines can become a link to this document.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
When handling files that consist only of power-of-two-sized blocks of
zero we'd know we have nothing to write, and when using sparse files
we'd never even create the temp file. Hence the sync would fail.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
Just to be entirely sure that if the migration succeeds the schema
version is always also updated. Currently if a migration succeeds but a
later migration doesn't, the changes of the migration apply but the
version stays - if the migration is breaking/non-idempotent, it will
fail when it tries to rerun it next time (otherwise it's just a
pointless re-execution).
Unfortunately with the current `db.runScripts` it wasn't that easy to
do, so I had to do quite a bit of refactoring. I am also ensuring the
right order of transactions now, though I assume that was already the
case lexicographically - can't hurt to be safe.
Currently, the input field has no step defined, meaning that it can be
increased with the arrow keys by the default value of "1". Considering
the fact that the default value is "3600" (seconds or one hour), it is
unlikely that the user wants to change it with such minimal steps.
For this reason, change the default step to "3600" (one hour). If the
user needs more granual control, they can still input the value
in seconds manually.
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Currently, the bandwidth limit input fields have no step defined, and as
such they use the default value of "1". Taking into account the fact
that these fields use KiB as their measurements, it makes more sense to
use larger steps, such as "1024" (1 MiB), as in most cases, it is very
unlikely that the user needs to have byte-level control over the limits.
Note that these steps only apply to increasing the values by using the
arrow keys, and the user is still allowed to input any value they want
manually.
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
This just removes an unnecessary foreign key constraint, where we
already do the garbage collection manually in the database service.
However, as part of getting here I tried a couple of other variants
along the way:
- Changing the order of the primary key from `(hash, blocklist_hash,
idx)` to `(blocklist_hash, idx, hash)` so that inserts would be
naturally ordered. However this requires a new index `on blocks (hash)`
so that we can still look up blocks by hash, and turns out to be
strictly worse than what we already have.
- Removing the primary key entirely and the `WITHOUT ROWID` to make it a
rowid table without any required order, and an index as above. This is
faster when the table is small, but becomes slower when it's large (due
to dual indexes I guess).
These are the benchmark results from current `main`, the second
alternative below ("Index(hash)") and this proposal that retains the
combined primary key ("combined"). Overall it ends up being about 65%
faster.
<img width="764" height="452" alt="Screenshot 2025-08-29 at 14 36 28"
src="https://github.com/user-attachments/assets/bff3f9d1-916a-485f-91b7-b54b477f5aac"
/>
Ref #10264
Currently, the number of hashers is always set to 1 on interactive
operating systems, which are defined as Windows, macOS, iOS, and
Android. However, with modern multicore CPUs, it does not make much
sense to limit performance so much.
For example, without this fix, a CPU with 16 cores / 32 threads is
still limited to using just a single thread to hash files per folder,
which may severely affect its performance.
For this reason, instead of using a fixed value, calculate the number
dynamically, so that it equals one-fourth of the total number of CPU
cores. This way, the value of hashes will still end up being just 1 on
a slower 4-thread CPU, but it will be allowed to take larger values when
the number of threads is higher, increasing hashing performance in the
process.
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Co-authored-by: Jakob Borg <jakob@kastelo.net>
Currently, the number of hashers, with the exception of some specific
operating systems or when defined manually, equals the number of CPU
cores divided by the overall number of folders, and it does not take
into account the value of MaxFolderConcurrency at all. This leads to
artificial performance limits even when MaxFolderConcurrency is set to
values lower than the number of cores.
For example, let's say that the number of folders is 50 and
MaxFolderConcurrency is set a value of 4 on a 16-core CPU. With the old
calculation, the number of hashers would still end up being just 1 due
to the large number of folders. However, with the new calculation, the
number of hashers in this case will be 4, leading to better hashing
performance per folder.
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Co-authored-by: Jakob Borg <jakob@kastelo.net>