This changes the files table to use normalisation for the names and
versions. The idea is that these are often common between all remote
devices, and repeating an integer is more efficient than repeating a
long string. A new benchmark bears this out; for a database with 100k
files shared between 31 devices, with some worst case assumption on
version vector size, the database is reduced in size by 50% and the test
finishes quicker:
Current:
db_bench_test.go:322: Total size: 6263.70 MiB
--- PASS: TestBenchmarkSizeManyFilesRemotes (1084.89s)
New:
db_bench_test.go:326: Total size: 3049.95 MiB
--- PASS: TestBenchmarkSizeManyFilesRemotes (776.97s)
The other benchmarks end up about the same within the margin of
variability, with one possible exception being that RemoteNeed seems to
be a little slower on average:
old files/s new files/s
Update/n=RemoteNeed/size=1000-8 5.051k 4.654k
Update/n=RemoteNeed/size=2000-8 5.201k 4.384k
Update/n=RemoteNeed/size=4000-8 4.943k 4.242k
Update/n=RemoteNeed/size=8000-8 5.099k 3.527k
Update/n=RemoteNeed/size=16000-8 3.686k 3.847k
Update/n=RemoteNeed/size=30000-8 4.456k 3.482k
I'm not sure why, possibly that query can be optimised anyhow.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
No practical effect, just a tiny bit of fun to stamp the database files
with an application ID that identifies them.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
Just to be entirely sure that if the migration succeeds the schema
version is always also updated. Currently if a migration succeeds but a
later migration doesn't, the changes of the migration apply but the
version stays - if the migration is breaking/non-idempotent, it will
fail when it tries to rerun it next time (otherwise it's just a
pointless re-execution).
Unfortunately with the current `db.runScripts` it wasn't that easy to
do, so I had to do quite a bit of refactoring. I am also ensuring the
right order of transactions now, though I assume that was already the
case lexicographically - can't hurt to be safe.
This changes the database structure to use one database per folder, with
a small main database to coordinate. Reverts the prior change to buffer
all files in memory when pulling, meaning there is now a phase where the
WAL file will grow significantly, at least for initial sync of folders
with many directories.
---------
Co-authored-by: bt90 <btom1990@googlemail.com>