This removes tons of boilerplate code around:
- retry loop
- connection management
- storage registration
* used generics in runInParallel
* introduced generics in freepool
* introduced strong typing for workshare.Pool and workshare.AsyncGroup
* fixed linter error on openbsd
Lack of generics support is blocking various dependency upgrades,
so this unblocks that.
Temporarily disabled `checklocks` linter until it is fixed upstream.
* Update display on repository summary
* Apply throughout app
* Situate units_test
* Update Command Line documentation
* Envar cleanup
* Rename to BytesString
* Restore envar string available for test
* Remove extraneous empty check and restore UIPreferences field for frontend
* PR: config bool cleanup and missed `BaseEnv`s
* Fix lint and test
Benchmarked from macOS client to a Linux server over Wifi connection:
(2-5ms latency)
linux 5.14.8 (1.1 GB) to a clean repository:
Before: 240s After: 27s (90% faster)
Fixes#2372
* feat(repository): added bigmap Set and Map backed by custom on-disk hashtable
* additional test coverage for corner cases
* profile flags
* exclude bigmapbench from code coverage
* refactored based on conversation with Julio
The concern was that values stored on disk are not encrypted.
* bigmap.Map - implements encryption of values
* bigmap.Set - stores keys only, so no encryption necessary
* bigmap.internalMap - used as backing structure for Map and Set
* implemented benchmark with values
* Make callback for upload file completion
Callback does not indicate that a file will be reachable immediately in
the resulting snapshot, but does indicate that the uploader is done
processing the file in some way (either via uploading data or finding a
previous version in the repo) and whether there was an error processing
the file.
* Tests for new FinishedFile callback
Ensure hadErr is properly populated and FinishedFile is called even if
the file was considered cached.
* Refine comment on interface function slightly
* Give callback error instead of bool about error
* Add locks around concurrent accesses in test
* refactor(repository): moved format blob management to separate package
This is completely mechanical, no behavior changes, only:
- moved types and functions to a new package
- adjusted visibility where needed
- added missing godoc
- renamed some identifiers to align with current usage
- mechanically converted some top-level functions into member functions
- fixed some mis-named variables
* refactor(repository): moved content.FormatingOptions to format.ContentFormat
* feat(repository): added `required features` to the repository
This is intended for future compatibility to be able to reliably
stop old kopia client from being able to open a repository when
the old code does not understand new `required feature`.
Required features are checked on startup and periodically using the
same method as upgrade lock, where they will return errors during blob
operations.
* pr feedback
Instead of passing static content.FormattingOptions (and caching it)
we now introduce an interface to provide its values.
This will allow the values to dynamically change at runtime in the
future to support cases like live migration.
* kopia format upgrade lock
* Update cli/command_repository_set_parameters_test.go
Co-authored-by: Ali Dowair <adowair@umich.edu>
* Update cli/command_repository_upgrade.go
Co-authored-by: Ali Dowair <adowair@umich.edu>
* Update cli/command_repository_upgrade.go
Co-authored-by: Ali Dowair <adowair@umich.edu>
* pr feedback
* pr feedback
* add a min drain time check
* env var for io-drain-timeout
* fix: add more doctext around upgrade phases
* build: wrap with EnvName
* add experimental warning
* protect upgrade cli behind env varible
* fix conflicts after relocating the upgrade lock
* generalize the command args
* drop certain features as per feedback
* sub-divide the upgrade command into begin and rollback
* Update cli/command_repository_upgrade.go
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* Update cli/command_repository_upgrade.go
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* missing return
* rename force flag to allow-unsafe-upgrade
Co-authored-by: Shikhar Mall <shikhar@kasten.io>
Co-authored-by: Ali Dowair <adowair@umich.edu>
Co-authored-by: Shikhar Mall <small@kopia.io>
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* feat(infra): improved support for in-process testing
* support for killing of a running server using simulated Ctrl-C
* support for overriding os.Stdin
* migrated many tests from the exe runner to in-process runner
* added required indirection when defining Envar() so we can later override it in tests
* refactored CLI runners by moving environment overrides to CLITestEnv
Some compression algorithms are not recommended because they
allocate disproportionate amounts of memory. They are still
possible to use, just marked as NOT RECOMMENDED in the UI.
- removed a bunch of hacks and should improve the logging
performance by avoiding interfaces and data translation. This will
allow using of de-sugared loggers in performance-critical
logging situations.
- this will also allow using features of ZAP more directly without
having to reimplement them.
- moved logging.Printf() to testlogging
- refactored `uitask` to store logs in a structural format and
present them as JSON only in the UI
- renamed printf_logger.go to printf.go so that fewer columns are used
in the logs
* Allow dynamic directory entries with virtualfs
* Tests for new virtualfs implementation
* Add escape hatch for estimator during upload
Some virtualfs.StreamingDirectory-s may not be able to (efficiently)
support iterating through entries multiple times. Make a way for the
estimator to ask if they support multiple iterations and skip the
directory if they do not.
* Exapand Directory interface
Expand the Directory interface instead of making a new interface as it's
error-prone to ensure all wrapper types properly handle types that use
the new interface.
* Post-rebase fixes
* Make StreamingDirectory single iteration only
Simplify code and test slightly by not allowing users to declare a
StreamingDirectory that can be iterated through multiple times.
* Add better test for estimator ignoring stream dir
Previous test in uploader had a race condition, meaning it may not catch
all cases.
* Ignore atomic access in checklocks
Comparisons known to be done after all additions to the variables in
question.
* Implement reviewer feedback
* Remove unused function parameter
* Unify sparse and normal IO output
This commit refactors the code paths that excercise normal and sparse
writing of restored content. The goal is to expose sparsefile.Copy()
and iocopy.Copy() to be interchangeable, thereby allowing us to wrap
or transform their behavior more easily in the future.
* Introduce getStreamCopier()
* Pull ioCopy() into getStreamCopier()
* Fix small nit in E2E test
We should be getting the block size of the destination file, not
the source file.
* Call stat.GetBlockSize() once per FilesystemOutput
A tiny refactor to pull this call out of the generated stream copier,
as the block size should not change from one file to the next within
a restore entry.
NOTE: as a side effect, if block size could not be found (an error
is returned), we will return the default stream copier instead of
letting the sparse copier fail. A warning will be logged, but this
error will not cause the restore to fail; it will proceed silently.
* fix(snapshots): fixed random deadlock when Uploader results in a failure
The deadlock was caused by not properly waiting for all asynchronous
work to complete before closing the worker pool.
Introduced `workshare.AsyncGroup.Close()` and some assertions.
* fixed select race
* linter fix
* pr feedback
* Remove remaining internal uses of Readdir
* Remove old helpers and interface functions.
* Update tests for updated fs.Directory interface
* Fix index out of range error in snapshot walker
Record one error if an error occurred and it's not limiting errors
* Use helper functions more; exit loops early
Follow up on reviewer comments and reduce code duplication, use more
targetted functions like Directory.Child, and exit directory iteration
early if possible.
* Remove fs.Entries type and unused functions
Leave some functions dealing with sorting and finding entries in fs
package. This retains tests for those functions while still allowing
mockfs to access them.
* Simplify function return
* refactor(repository): ensure we always parse content.ID and object.ID
This changes the types to be incompatible with string to prevent direct
conversion to and from string.
This has the additional benefit of reducing number of memory allocations
and bytes for all IDs.
content.ID went from 2 allocations to 1:
typical case 32 characters + 16 bytes per-string overhead
worst-case 65 characters + 16 bytes per-string overhead
now: 34 bytes
object.ID went from 2 allocations to 1:
typical case 32 characters + 16 bytes per-string overhead
worst-case 65 characters + 16 bytes per-string overhead
now: 36 bytes
* move index.{ID,IDRange} methods to separate files
* replaced index.IDFromHash with content.IDFromHash externally
* minor tweaks and additional tests
* Update repo/content/index/id_test.go
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* Update repo/content/index/id_test.go
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* pr feedback
* post-merge fixes
* pr feedback
* pr feedback
* fixed subtle regression in sortedContents()
This was actually not producing invalid results because of how base36
works, just not sorting as efficiently as it could.
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* feat(cli): implementation for 'kopia snapshot fix'
This allows modifications and fixes to the snapshots after they have
been taken.
Supported are:
* `kopia snapshot fix remove-invalid-files [--verify-files-percent=X]`
Removes all directory entries where the underlying files cannot be
read based on index analysis (this does not read the files, only index
structures so is reasonably quick).
`--verify-files-percent=100` can be used to trigger full read for
all files.
* `kopia snapshot fix remove-files --object-id=<object-id>`
Removes the object with a given ID from the entire snapshot tree.
Useful when you accidentally snapshot a sensitive file.
* `kopia snapshot fix remove-files --filename=<wildcard>`
Removes the files with a given name from the entire snapshot tree.
Useful when you accidentally snapshot a sensitive file.
By default all snapshots are analyzed and rewritten. To limit the scope
use:
--source=user@host:/path
--manifest-id=manifestID
By default the rewrite operation writes new directory entries but
does not replace the manifests. To do that pass `--commit`.
Related #1906Fixes#799
reorganized CLI per PR suggestion
* additional logging for diff command
* added Clone() method to snapshot manifst and directory entry
* added a comprehensive test, moved DirRewriter to separate file
* pr feedback
* more pr feedback
* improved logging output
* disable test in -race configuration since it's way to slow
* pr feedback
- expand command flag description for clarification
- include blob id in blob get error in the cache
- nit: remove unused BOTO_PATH
- nit: fix comment
- cleanup: remove unnecessary function declaration in interface
- leverage 'testify' to simplify test
* New interface method to iterate over dir entries
* Fix build and test failures from interface
* Fix entry iteration for StaticDirectory
* Make utility function for directory iteration
* Fix lint errors
* No wrapcheck on fs.ReaddirToIterate
* Be consistent for IterateEntry implementations