mirror of
https://github.com/kopia/kopia.git
synced 2026-05-16 02:34:10 -04:00
* feat(general): added internal/workshare package
This introduces work sharing utility useful when walking trees of
things (such as filesystem), which allows N threads/goroutines to be
used.
Whenever a routine is visiting its children, it can share some of that
work with another idle goroutine in the pool (when available). If
no other goroutine is idle, we are already at capacity and the caller
simply does the work in their own goroutine.
The API introduced here is not the most beautiful, but allows us to
avoid allocations in most cases, which is critical for high-performance
data processing.
* feat(snapshots): speed up uploads by parallelizing directory traversal
Previously directories were walked strictly sequentially which means
we could never be uploading data from multiple directories in parallel,
even if they had just a few files each.
This change switches to using the new `workshare` utility which improves
parallelism. It also reduces memory allocations, goroutine creations
and overall memory usage when taking large snapshots, while increasing
CPU utilization.
Tests on realistic directory structures show huge speed-ups during cold
snapshots (without any metadata caching:)
Photo library - 160GB, files:41717 dirs:1350
Before: 3m11s
After: 1m50s
Total time reduction: 43%
Working code directory - 30.7 GB files:194560 dirs:42455
Before: 55s
After: 25s
Total time reduction: 55%
* do not report multiple cancelation errors during parallel uploads
* do not report multiple cancelation errors during parallel uploads
* pr feedback, clarified usage, added comments
* fixed flaky test