* fix(cli): Update Kingpin dependency to fix time.Duration type flags like --retention-period
* test(cli): add a test for duration parser to parse days, weeks
New issues should all be created in the main (`kopia/kopia`) project,
as that project is the most watched. Keeping all issues in one
central location will help us track and address them more
efficiently. We can use issue labels to indicate that an issue
relates to the UI or other auxiliary project.
* Allow dynamic directory entries with virtualfs
* Tests for new virtualfs implementation
* Add escape hatch for estimator during upload
Some virtualfs.StreamingDirectory-s may not be able to (efficiently)
support iterating through entries multiple times. Make a way for the
estimator to ask if they support multiple iterations and skip the
directory if they do not.
* Exapand Directory interface
Expand the Directory interface instead of making a new interface as it's
error-prone to ensure all wrapper types properly handle types that use
the new interface.
* Post-rebase fixes
* Make StreamingDirectory single iteration only
Simplify code and test slightly by not allowing users to declare a
StreamingDirectory that can be iterated through multiple times.
* Add better test for estimator ignoring stream dir
Previous test in uploader had a race condition, meaning it may not catch
all cases.
* Ignore atomic access in checklocks
Comparisons known to be done after all additions to the variables in
question.
* Implement reviewer feedback
* Remove unused function parameter
* Unify sparse and normal IO output
This commit refactors the code paths that excercise normal and sparse
writing of restored content. The goal is to expose sparsefile.Copy()
and iocopy.Copy() to be interchangeable, thereby allowing us to wrap
or transform their behavior more easily in the future.
* Introduce getStreamCopier()
* Pull ioCopy() into getStreamCopier()
* Fix small nit in E2E test
We should be getting the block size of the destination file, not
the source file.
* Call stat.GetBlockSize() once per FilesystemOutput
A tiny refactor to pull this call out of the generated stream copier,
as the block size should not change from one file to the next within
a restore entry.
NOTE: as a side effect, if block size could not be found (an error
is returned), we will return the default stream copier instead of
letting the sparse copier fail. A warning will be logged, but this
error will not cause the restore to fail; it will proceed silently.
* refactor(repository): re-enabled parallel uploads of blobs
This fixes a performance regression where all blob writes were
serialized under a lock introduced by mistake in #1838.
Added regression test to prevent this in the future.
* added TestParallelUploadUploadsBlobsInParallel
* make linter happy again
Fixed regression caused by #1980
Switching to new mmap library uncovered a bug where we were not properly
closing indices that are no longer in use, since previous mmap had
a finalizer defined to do so.
This fixes regression introduced in #1960.
Tested by backing up Linux 5.14.8 source on M1 Mac (average of 15 runs):
Before: duration=6.7s avg_heap_objects=7411657 avg_heap_bytes=871794888
After: duration=5.6s (17% faster) avg_heap_objects=5947800 (20% less) avg_heap_bytes=795762120 (9% less)
* fix(snapshots): fixed random deadlock when Uploader results in a failure
The deadlock was caused by not properly waiting for all asynchronous
work to complete before closing the worker pool.
Introduced `workshare.AsyncGroup.Close()` and some assertions.
* fixed select race
* linter fix
* pr feedback
* Remove remaining internal uses of Readdir
* Remove old helpers and interface functions.
* Update tests for updated fs.Directory interface
* Fix index out of range error in snapshot walker
Record one error if an error occurred and it's not limiting errors
* Use helper functions more; exit loops early
Follow up on reviewer comments and reduce code duplication, use more
targetted functions like Directory.Child, and exit directory iteration
early if possible.
* Remove fs.Entries type and unused functions
Leave some functions dealing with sorting and finding entries in fs
package. This retains tests for those functions while still allowing
mockfs to access them.
* Simplify function return
* test MinContentAgeSubjectToGC
* lint: move check for whether content is deleted to the caller to reduce gocycle complexity
* nit: add new lines before return
* log GC stats
* fix(maintenance): use a fixed time for protecting newly created content
Previously, the reference time used to determine whether a content had been
recently created would change througout a snapshot GC execution. For
long-running GC tasks, this non-deterministically shrinked the safety window
specified in `MinContentAgeSubjectToGC`.
Now, the snapshot GC starting time is used as a fix refererence for the
safety check.
* remove test entry point to avoid double execution of the test
New flag `--enable-jaeger-collector` and the corresponding
`KOPIA_ENABLE_JAEGER_COLLECTOR` environment variable enables Jaeger
exporter, which by default sends OTEL traces to Jaeger collector on
http://localhost:14268/api/traces
To change this, use environment variables:
* `OTEL_EXPORTER_JAEGER_ENDPOINT`
* `OTEL_EXPORTER_JAEGER_USER`
* `OTEL_EXPORTER_JAEGER_PASSWORD`
When tracing is disabled, the impact on performance is negligible.
To see this in action:
1. Download latest Jaeger all-in-one from https://www.jaegertracing.io/download/
2. Run `jaeger-all-in-one` binary without any parameters.
3. Run `kopia --enable-jaeger-collector snapshot create ...`
4. Go to http://localhost:16686/search and search for traces
When enabled, metrics are pushed to the provided Prometheus Push
Gateway at the start and end of each command and periodically every
few seconds.
```
--metrics-push-addr=http://address:port
--metrics-push-interval=5s
--metrics-push-job=kopia
--metrics-push-grouping=a:b --metrics-push-grouping=c:d
--metrics-push-username=user
--metrics-push-password=pass
```
* feat(repository): switched to using go-mmap for indexes
Previous mmap implementation only allowed io.ReaderAt API
this one allows direct access to the underlying bytes which helps
remove a bunch of buffers, copying and allocation in index parsing.
* let the compiler do bounds check
* removed one more unnecessary allocation
* pr feedback
When combined with #1963, it significantly reduces memory usage.
When backing up Kopia enlistment with various binaries 2.8GB
(files:74180 dirs:12322):
Before: max memory 440MB, time 5.8s
After: max memory 360MB, time 5.4s