In the common case (sparse files enabled, not reusing old data) we'd
optimise away pulling & writing zero blocks. However in the corner cases
we'd go through the whole processing of pulling the block over the
network which is of course entirely unnecessary.
Now, instead, always take an optimised path for all-zeroes blocks. In
the clean case we do nothing, otherwise we materialise a block of zeroes
and write it directly.
---------
Signed-off-by: Jakob Borg <jakob@kastelo.net>
This adds a new folder-level configuration `FullBlockIndex`. It controls
whether we maintain the block index for a given folder -- currently
that's always true, now it becomes possible to turn off. The block index
is used for lookup of blocks across files and folders. Effectively, when
syncing a change, for each block, we check:
1. Is the block already present in the old version of the file? If so,
we can reuse (copy) it without network transfer. **This check is always
possible.**
2. Is the block already present in any other file in this folder or
other folders? If so we can copy it. **This check is only possible with
the full block index.**
3. We must transfer the block over the network.
Maintaining the full block index is costly in time, I/O and database
size. With this PR, maintaining the full block index becomes the default
for send-receive and receive-only folders only, with it disabled for
send-only and receive-encrypted folders. The block index is never useful
for encrypted folders, as blocks are encrypted separate for each file.
It is also not useful for send-only folders by themselves, though the
data in the send-only folder could be reused by other receive-type
folders if it were enabled.
For very large folders it may make sense to disable the full block index
regardless of folder type and just accept the resulting decrease in data
reuse.
Disabling or enabling the option in the GUI causes the index to be
destroyed or rebuilt accordingly.
https://github.com/syncthing/docs/pull/1005
---------
Signed-off-by: Jakob Borg <jakob@kastelo.net>
Register HTTP and HTTPS proxy dialers and implement CONNECT-based
tunneling for HTTP proxies.
The new dialer supports:
- Plain HTTP proxies using CONNECT
- HTTPS proxies by performing a TLS handshake before CONNECT
- Optional basic authentication via Proxy-Authorization (with a warning
when creds are used over cleartext HTTP)
This allows all_proxy to be set to http:// or https:// URLs, enabling
data transfer through HTTP(S) proxies.
### Purpose
Allow peers to connect using HTTP Proxies (CONNECT)
### Testing
Tested with both HTTP and HTTPS proxy connection, using both no auth and
plain authentication.
### Screenshots
No visual change
### Documentation
https://github.com/syncthing/docs/pull/987
## Authorship
Your name and email will be added automatically to the AUTHORS file
based on the commit metadata.
---------
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Jakob Borg <jakob@kastelo.net>
Co-authored-by: Jakob Borg <jakob@kastelo.net>
Hopefully deflakes TestHTTPLogin on Windows, where it currently often
times out, presumably in the config saving stage after already having
started a shutdown of the API and being CPU constrained due to password
hashing.
---------
Signed-off-by: Jakob Borg <jakob@kastelo.net>
The test expected the stopped scanner to produce at most numHashers
additional results, but there's also the case where a directory is
encountered (which doesn't require hashing) and sent directly.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
There was a race condition where using IndexUpdate would trigger a pull,
which would sync the delete we are looking for, making the completion
100%. By doing the insert directly into the database we are not
triggering these things and get the expected completion percentage
always.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
These have been flaky for a long time, seemingly because the multiple
connection code slightly changed the timing of cluster config sending by
moving them to the connection promotion loop. This adds some resiliency
to that, instead of assuming that the CC:s will be immediately available
after adding the connection.
---------
Signed-off-by: Jakob Borg <jakob@kastelo.net>
Also adds a method to query the last database maintenance time.
Signed-off-by: Tommy van der Vorst <tommy@pixelspark.nl>
Co-authored-by: Jakob Borg <jakob@kastelo.net>
This change allows the periodic database maintenance to be disabled, while providing a way to programmatically start maintenance at a convenient moment.
Signed-off-by: Tommy van der Vorst <tommy@pixelspark.nl>
In #9701 there was a change that put the mutex used for `getExpireAdd` directly in `defaultRealCaser`, which is erroneous because multiple filesystems can share the same `caseCache`.
### Purpose
Fixes#9836 and [Slow sync sending files from Android](https://forum.syncthing.net/t/slow-sync-sending-files-from-android/24208?u=marbens). There may be other issues caused by `getExpireAdd` conflicting with itself, though.
### Testing
Unit tests pass and the case cache and conflict detection _seem_ to behave correctly.
Signed-off-by: Marcus B Spencer <marcus@marcusspencer.us>
This adds a new field to the file information we keep, the "previous
blocks hash". This is the hash of the file contents as it was in its
previous incarnation. That is, every scan that updates the blocks hash
will move the current hash to the "previous" field.
This enables an addition to the conflict detection algorithm: if the
file to be synced is in conflict with the current file on disk
(version-counter wise), but it indicates that it was based on the
precise contents we have (new.prevBlocksHash == current.blocksHash),
then it's not really a conflict.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
This changes the files table to use normalisation for the names and
versions. The idea is that these are often common between all remote
devices, and repeating an integer is more efficient than repeating a
long string. A new benchmark bears this out; for a database with 100k
files shared between 31 devices, with some worst case assumption on
version vector size, the database is reduced in size by 50% and the test
finishes quicker:
Current:
db_bench_test.go:322: Total size: 6263.70 MiB
--- PASS: TestBenchmarkSizeManyFilesRemotes (1084.89s)
New:
db_bench_test.go:326: Total size: 3049.95 MiB
--- PASS: TestBenchmarkSizeManyFilesRemotes (776.97s)
The other benchmarks end up about the same within the margin of
variability, with one possible exception being that RemoteNeed seems to
be a little slower on average:
old files/s new files/s
Update/n=RemoteNeed/size=1000-8 5.051k 4.654k
Update/n=RemoteNeed/size=2000-8 5.201k 4.384k
Update/n=RemoteNeed/size=4000-8 4.943k 4.242k
Update/n=RemoteNeed/size=8000-8 5.099k 3.527k
Update/n=RemoteNeed/size=16000-8 3.686k 3.847k
Update/n=RemoteNeed/size=30000-8 4.456k 3.482k
I'm not sure why, possibly that query can be optimised anyhow.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
We have a slightly naive io.ReadAll on the authentication handler, which
can result in unlimited memory consumption from an unauthenticated API
endpoint. Add a reasonable limit there.
Signed-off-by: Jakob Borg <jakob@kastelo.net>