We've had weak/rolling hashing in the code for quite a while. It was a
popular request for a while, based on the belief that rsync does this
and we should too. However, the benefit is quite small; we save on
average about 0.8% of transferred blocks over the population as a whole:
<img width="974" alt="Screenshot 2025-03-28 at 17 09 02"
src="https://github.com/user-attachments/assets/bbe10dea-f85e-4043-9823-7cef1220b4a2"
/>
This would be fine if the cost was comparably low, however the downside
of attempting rolling hash matching is that we (by default) do a
complete file read on the destination in order to look for matches
before we starting pulling blocks for the file. For any larger file this
means a sometimes long, I/O-intensive pause before the file starts
syncing, for usually no benefit.
I propose we simply rip off the bandaid and save the effort.
This adds a config to enable debug functions on the API server, which is
by default disabled. When enabled, the /rest/debug things become
available and become available without requiring a CSRF token (although
authentication is required if configured).
We also add a new endpoint /rest/debug/cpuprof?duration=15s (with the
duration being configurable, defaulting to 30s). This runs a CPU profile
for the duration and returns it as a file. It sets headers so that a
browser will save the file with an informative name.
The same is done for heap profiles, /rest/debug/heapprof, which does not
take any parameters.
The purpose of this is that any user can enable debugging under
advanced, then point their browser to the endpoint above and get a file
that contains a CPU or heap profile we can use, with the filename
telling us what version and architecture the profile is from.
On the command line, this becomes
curl -O -J http://localhost:8082/rest/debug/cpuprof?duration=5s
curl: Saved to filename
'syncthing-cpu-darwin-amd64-v0.14.3+4-g935bcc0-110307.pprof'
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/3467