Protect filesystem subtrees from concurrent manipulation during critical sections
if engine actions are called asynchronously. This change provides coordination
between the `Snapshotter` and the `FileWriter`. For example, the `FileWriter`
should be blocked from perturbing the same directory tree if a
Gather-Snapshot is taking place along that tree simultaneously.
This will ensure the fingerprint data accumulated during the `Gather` phase
will correspond unambiguously to the data included in the snapshot.
Extend build flags to kopia snapshotter
This package now imports fswalker which can only be built for
darwin,amd64 or linux,amd64
* linter: upgraded to 1.33, disabled some linters
* lint: fixed 'errorlint' errors
This ensures that all error comparisons use errors.Is() or errors.As().
We will be wrapping more errors going forward so it's important that
error checks are not strict everywhere.
Verified that there are no exceptions for errorlint linter which
guarantees that.
* lint: fixed or suppressed wrapcheck errors
* lint: nolintlint and misc cleanups
Co-authored-by: Julio López <julio+gh@kasten.io>
Remove unnecessary intermediate variables.
Send SIGTERM instead of SIGKILL to terminate child kopia server process.
Set Pdeathsig on Linux for child kopia server process.
Trivial: reduce scope of hostFioDataPathStr variable.
Trivial: rename local variable.
Trivial: Use log.Fatalln instead of log + exit(1).
Improve error message in robustness test to tell apart failure cause.
* Robustness engine actions with stats and logging
- Add actions to robustness engine
- Actions wrap other functional behavior and serve as a common interface for collecting stats
- Add stats for the engine, both per run and cumulative over time
- Add a log for actions that the engine has executed
- Add recovery logic to re-sync snapshot metadata after a possible failed engine run (e.g. if metadata wasn't properly persisted).
Current built-in actions:
- snapshot root directory
- restore random snapshot ID into a target restore path
- delete a random snapshot ID
- run GC
- write random files to the local data directory
- delete a random subdirectory under the local data directory
- delete files in a directory
- restore a snapshot ID into the local data directory
Actions are executed according to a set of options, which dictate the relative probabilities of picking a given action, along with ranges for action-specific parameters that can be randomized.
* [Robustness] Add command line parameters for kopia snapshotter
Add flags for:
- no-progress
- parallel
- cache sizes
- no update check
Add an integration test to validate snapshotter expected output
against a kopia executable.
Followup on recent PR #529, some suggestions and discussion after it was merged:
- Express probability as float in range [0,1]
- Add a unit test for DeleteContentsAtDepth
- Add a comment on writeFilesAtDepth explaining depth vs branchDepth
- Refactor pickRandSubdirPath for easier readability and understanding
Upon some reflection, I decided to refactor pickRandSubdirPath() to gather indexes and pick randomly from them instead of the previous reservoir sampling approach. I think this is easier to understand going forward without extra explanation, doesn't have much additional memory overhead, and reduces the number of rand calls to 1.
* [Robustness] Add additional fio workloads
Add more fio workloads to write files at different depths in random
branches of the generated file system tree.
- Write files at depth
- Write files at a specified depth, creating a new directory branch at
a random depth
- Delete a random directory at a given depth
- Delete some or all of the contents of a random directory at
a specified depth
* [Robustness] Fix for kopia runner and custom work dir
Apply fix similar to #293 for the robustness kopia runner.
Add control for runner working directory.
Fix fswalker to ignore hostname to allow reporting
on walks done across different hosts. Also prevent
Before and After walk data from printing to reduce log size.
* fixed a number of cases where misaligned data was causing panics on armv7 (but not armv8)
* travis: enable arm64
* test: reduce compressed data sizes when running on arm
* arm: wait longer for snapshots
Add two tests:
- TestManySmallFiles: writes 100k files size 4k to a directory. Snapshots the data tree, restores and validates data.
- TestModifyWorkload: Loops over a simple randomized workload. Performs a series of random file writes to some random sub-directories, then takes a snapshot of the data tree. All snapshots taken during this test are restore-verified at the end.
A global test engine is instantiated in main_test.go, to be used in the robustness test suite across tests (saves time loading/saving metadata once per run instead of per test).
New usage:
```
kopia snapshot delete manifestID... [--delete]
kopia snapshot delete rootObjectID... [--delete]
```
Fixes#435
cli: added --unsafe-ignore-source as alias for `--delete`
This is a hidden flag for backwards compatibility. It will be removed.
Snapshotter interface describes an entity that can create,
restore, and delete snapshots, as well as manage a repository.
Add kopia implementation of the snapshotter interface.
Update the fio runner to use a docker image if the appropriate environment variable is set. Docker image is built via a makefile target and used in the robustness tool tests.
Add the walk policy flag WalkCrossDevice to the fswalker Walk calls. This will avoid potential issues when running in a docker container where a FS tree is made of many overlays. Without the flag set, the Walk operation skips over files on devices that do not match the device at the base path.
Add comparer interface which gathers data on a path and
compares that data to a new path, returning error if the path
differs in any way from the input data. The details of what
constitutes a difference is left to the implementation.
FSWalker implementation uses Walk and Report to do the data
gathering and comparison. Filters are applied to sort out any
differences that might be expected (e.g. ctime, atime, mtime,
rename of root directory after restore).
Temporary workaround for compile issues on MacOS and Windows due
to upstream fswalker bug. Only build the reporter and walker
packages as GOOS=linux for now.
Adds a wrapper around `Walk` that takes a Policy (protobuf definition) and performs a walk using it as configuration. The resulting Walk struct pointer is returned. The only exported functionality is unfortunately to read the Policy as a protobuf text file, so the implementation creates a temporary policy file whose lifetime is the duration of the call.
Adds a wrapper around the the FSWalker reporter `Compare` functionality. Takes a config file and two Walk pointers and compares the walks, returning the pb-defined Report struct. Again, the only exported functionality for reading config information is to read it as a protobuf text file. Creates a temporary config file, whose lifetime is the duration of the call, to pass in to the fswalker function.
Adding a helper library that wraps fio execution. This is the basic initial check-in that implements the runners, configs, and a single WriteFiles helper. It should be enough to unblock subsequent tasks that will use fio to generate data sets for kopia snapshot verification. More helper workloads can be added as needed.
In this implementation the tests will all skip from test main if the `FIO_EXE` env variable is not set. Adding fio to the CI environment will be addressed as a separate PR.
Tracking progress in issue https://github.com/kopia/kopia/issues/179