Commit Graph

131 Commits

Author SHA1 Message Date
Jarek Kowalski
40acf238f3 Fixed arm and arm64 build. (#506)
* fixed a number of cases where misaligned data was causing panics on armv7 (but not armv8)
* travis: enable arm64
* test: reduce compressed data sizes when running on arm
* arm: wait longer for snapshots
2020-07-30 17:31:28 -07:00
Jarek Kowalski
8ead49b779 restore: support for zip, tar and tar.gz restore outputs (#482)
* restore: support for zip, tar and tar.gz restore outputs

Moved restore functionality to its own package.

* Fix enum values in the 'mode' flag

Co-authored-by: Julio López <julio+gh@kasten.io>
2020-07-22 22:56:11 -07:00
Nick
ce5e6dcd13 [Robustness] Add first robustness tests
Add two tests:
- TestManySmallFiles: writes 100k files size 4k to a directory. Snapshots the data tree, restores and validates data.
- TestModifyWorkload: Loops over a simple randomized workload. Performs a series of random file writes to some random sub-directories, then takes a snapshot of the data tree. All snapshots taken during this test are restore-verified at the end.

A global test engine is instantiated in main_test.go, to be used in the robustness test suite across tests (saves time loading/saving metadata once per run instead of per test).
2020-07-14 22:37:11 -07:00
Nick
82a2fa0ea5 [Robustness] Add test engine to manage snapshot verification testing (#468)
* Add test engine to manage snapshot verification testing

Test engine manages the test and metadata repositories, snapshot
checker, metadata storage persistence, and file writer. It is
the high level helper that will be invoked in the snapshot
verification testing suite.

- modify data directory file structure
- issue snapshot/restore/delete to the data directory
- accumulate metadata over the course of the test suite
- flush accumulated metadata to the metadata repository
- load historical metadata from the repository on initialization
- perform automatic data integrity verification on snap restore

This change corresponds to the robustness execution engine component from the design documentation.
2020-06-27 23:46:37 -07:00
Jarek Kowalski
79757672ca server: implemented 'flush' and 'refresh' API
Added test that verifies that when client performs Flush (which happens
at the end of each snapshot and when repository is closed), the
server writes new blobs to the storage.

Fixes #464
2020-06-07 19:38:13 -07:00
Nick
b30da511e7 Wire up snapshot-store-compare behavior
Connect the snapshotter, the storer, and the comparer. Invoke
the snapshotter to take/restore/delete snapshots on the repo,
the comparer to gather metadata before the snapshot and
after the restore, and the storer to save metadata for later
lookup when verifying restores.
2020-05-31 21:12:31 -07:00
Jarek Kowalski
d68273a576 Improvements for dealing with eventually-consistent stores (S3) (#437)
* content: added support for cache of own writes

Thi keeps track of which blobs (n and m) have been written by the
local repository client, so that even if the storage listing
is eventually consistent (as in S3), we get somewhat sane behavior.

Note that this is still assumming read-after-create semantics, which
S3 also guarantees, otherwise it's very hard to do anything useful.

* compaction: support for compaction logs

Instead of compaction immediately deleting source index blobs, we now
write log entries (with `m` prefix) which are merged on reads
and applied only if the blob list includes all inputs and outputs, in
which case the inputs are discarded since they are known to have been
superseded by the outputs.

This addresses eventual consistency issues in stores such as S3,
which don't guarantee list-after-put or list-after-delete. With such
stores the repository is ultimately eventually consistent and there's
not much that can be done about it, unless we use second strongly
consistent storage (such as GCS) for the index only.

* content: updated list cache to cache both `n` and `m`

* repo: fixed cache clear on windows

Clearing cache requires closing repository first, as Windows is holding
the files locked.

This requires ability to close the repository twice.

* content: refactored index blob management into indexBlobManager

* testing: fixed blobtesting.Map storage to allow overwrites

* blob: added debug output String() to blob.Metadata

* testing: added indexBlobManager stress test

This works by using N parallel "actors", each repeatedly performing
operations on indexBlobManagers all sharing single eventually consistent
storage.

Each actor runs in a loop and randomly selects between:

- *reading* all contents in indexes and verifying that it includes
  all contents written by the actor so far and that contents are
  correctly marked as deleted
- *creating* new contents
- *deleting* one of previously-created contents (by the same actor)
- *compacting* all index files into one

The test runs on accelerated time (every read of time moves it by 0.1
seconds) and simulates several hours of running.

In case of a failure, the log should provide enough debugging
information to trace the exact sequence of events leading up to the
failure - each log line is prefixed with actorID and all storage
access is logged.

* makefile: increase test timeout

* content: fixed index blob manager race

The race is where if we delete compaction log too early, it may lead to
previously deleted contents becoming temporarily live again to an
outside observer.

Added test case that reproduces the issue, verified that it fails
without the fix and passed with one.

* testing: improvements to TestIndexBlobManagerStress test

- better logging to be able to trace the root cause in case of a failure
- prevented concurrent compaction which is unsafe:

The sequence:

1. A creates contentA1 in INDEX-1
2. B creates contentB1 in INDEX-2
3. A deletes contentA1 in INDEX-3
4. B does compaction, but is not seeing INDEX-3 (due to EC or simply
   because B started read before #3 completed), so it writes
   INDEX-4==merge(INDEX-1,INDEX-2)
   * INDEX-4 has contentA1 as active
5. A does compaction but it's not seeing INDEX-4 yet (due to EC
   or because read started before #4), so it drops contentA1, writes
   INDEX-5=merge(INDEX-1,INDEX-2,INDEX-3)
   * INDEX-5 does not have contentA1
7. C sees INDEX-5 and INDEX-5 and merge(INDEX-4,INDEX-5)
   contains contentA1 which is wrong, because A has been deleted
   (and there's no record of it anywhere in the system)

* content: when building pack index ensure index bytes are different each time by adding 32 random bytes
2020-05-31 17:11:20 -07:00
Julio López
0875939d56 dep: upgrade protobuf dependents (#442)
Upgrade cloud.google.com/go/storage to v1.8.0 from version 1.6.0

Change logs:
- https://github.com/googleapis/google-cloud-go/releases/tag/storage%2Fv1.8.0
- https://github.com/googleapis/google-cloud-go/releases/tag/storage%2Fv1.7.0

Protobuf from 1.3.5 to 1.4.2
- https://github.com/golang/protobuf/releases
- https://github.com/golang/protobuf/releases/tag/v1.4.2

Use google.golang.org/protobuf version 1.23.0
Instead of github.com/golang/protobuf/proto which has been superseded
- https://github.com/protocolbuffers/protobuf-go/releases

cloud.google.com/go from 0.54.0 to 0.57.0
- https://github.com/googleapis/google-cloud-go/releases/tag/v0.57.0
- https://github.com/googleapis/google-cloud-go/releases/tag/v0.56.0
- https://github.com/googleapis/google-cloud-go/releases/tag/v0.55.0

google.golang.org/api from 0.20 to 0.25.0
- https://github.com/googleapis/google-api-go-client/releases

github.com/prometheus/client_golang to 1.6.0
- https://github.com/prometheus/client_golang/releases

Required changes:
- Fix import paths for protobuf imports
- Add linter exception
- Use prototext package to marshal to text
2020-05-21 13:22:59 -07:00
Julio López
23c12125d8 Build test/tools in Darwin as well (#447)
Upgrade github.com/google/fswalker to get fixes for Darwin/macOS

#212 
google/fswalker#25
2020-05-21 12:02:27 -07:00
Jarek Kowalski
ca28469706 cli: improved 'snapshot delete' usage (#436)
New usage:

```
kopia snapshot delete manifestID... [--delete]
kopia snapshot delete rootObjectID... [--delete]
```

Fixes #435

cli: added --unsafe-ignore-source as alias for `--delete`
This is a hidden flag for backwards compatibility. It will be removed.
2020-05-13 23:43:45 -07:00
Jarek Kowalski
d657415817 testing: added blob.Storage wrapper that simulates eventual consistency (#434)
This is done by introducing N unsynchronized caches, which simulate
what frontend of a cloud storage system might do, that causes eventual
consistency behavior.
2020-05-09 12:19:32 -07:00
Jarek Kowalski
be4b897579 Support for remote repository (#427)
Support for remote content repository where all contents and
manifests are fetched over HTTP(S) instead of locally
manipulating blob storage

* server: implement content and manifest access APIs
* apiclient: moved Kopia API client to separate package
* content: exposed content.ValidatePrefix()
* manifest: added JSON serialization attributes to EntryMetadata
* repo: changed repo.Open() to return Repository instead of *DirectRepository
* repo: added apiServerRepository
* cli: added 'kopia repository connect server'
  This sets up repository connection via the API server instead of
  directly-manipulated storage.
* server: add support for specifying a list of usernames/password via --htpasswd-file
* tests: added API server repository E2E test
* server: only return manifests (policies and snapshots) belonging to authenticated user
2020-05-02 21:41:49 -07:00
Jarek Kowalski
4b4628a21e Repository maintenance support (#411)
Maintenance: support for automatic GC

Moved maintenance algorithms from 'cli' to 'repo/maintenance' package

Added support for CLI commands:

kopia gc - performs quick maintenance
kopia gc --full- perform full maintenance

Full maintenance performs snapshot gc, but it's not safe to do this automatically possibly in parallel to snapshots being taken. This will be addressed ~0.7 timeframe.
2020-04-14 00:11:41 -07:00
Jarek Kowalski
057c2789d8 Kopia UI: support for multiple repositories + portability (#398)
* server: when serving HTML UI, prefix the title with string from KOPIA_UI_TITLE_PREFIX envar

* kopia-ui: support for multiple repositories + portability

This is a major rewrite of the app/ codebase which changes
how configuration for repositories is maintained and how it flows
through the component hierarchy.

Portable mode is enabled by creating 'repositories' subdirectory before
launching the app.

on macOS:
  <parent>/KopiaUI.app
  <parent>/repositories/

On Windows, option #1 - nested directory
  <parent>\KopiaUI.exe
  <parent>\repositories\

On Windows, option #2 - parallel directory
  <parent>\some-dir\KopiaUI.exe
  <parent>\repositories\

In portable mode, repositories will have 'cache' and 'logs' nested
in it.
2020-04-04 17:18:37 -07:00
Jarek Kowalski
6cb9b8fa4f repo: refactored public API (#318)
* This is 99% mechanical:

Extracted repo.Repository interface that only exposes high-level object and manifest management methods, but not blob nor content management.

Renamed old *repo.Repository to *repo.DirectRepository

Reviewed codebase to only depend on repo.Repository as much as possible, but added way for low-level CLI commands to use DirectRepository.

* PR fixes
2020-03-26 08:04:01 -07:00
Seb Patane
6789f8e64c cli: allow override of snapshot start time and end time 2020-03-21 09:27:32 -07:00
Jarek Kowalski
239d809075 performance: introduced buf.Pool which helps reuse memory buffers (#345)
* performance: added buf.Pool which can be used to manage ephemeral buffers for encryption and compression
* repo: switched object writer to buf.Pool
* content: switched encryption to use buf.Pool
* object: switched compression to use buf.Pool
* testing: added missing content manager Close()
2020-03-18 20:42:16 -07:00
Nick
78a921ab2c Rename Simple mutex 2020-03-14 12:00:22 -07:00
Nick
60d1cc821a [robustness testing] Implement metadata store for test metadata using kopia
Add an implementation of the metadata store using kopia snapshots and restores to manage persistence of walk metadata. Metadata are stored to and retrieved from the store, and a mechanism for persisting the store is implemented using kopia snapshots.
2020-03-14 12:00:22 -07:00
Jarek Kowalski
6b04b42794 tests: added smoke test that exercises all combinations of encryption and hashing 2020-03-13 20:42:12 -07:00
Jarek Kowalski
e80f5536c3 performance: plumbed through output buffer to encryption and hashing,… (#333)
* performance: plumbed through output buffer to encryption and hashing, so that the caller can pre-allocate/reuse it

* testing: fixed how we do comparison of byte slices to account for possible nils, which can be returned from encryption
2020-03-12 08:27:44 -07:00
Jarek Kowalski
514df69afa performance: added wrapper around io.Copy()
this pools copy buffers so they can be reused instead of throwing away
after each io.Copy()
2020-03-10 21:52:30 -07:00
Nick
05852322da Add snapshotter interface and kopia implementation
Snapshotter interface describes an entity that can create,
restore, and delete snapshots, as well as manage a repository.

Add kopia implementation of the snapshotter interface.
2020-03-10 07:32:14 -07:00
Julio López
d9ce3d0ad6 Inject time in Kopia components (#314)
Motivation: Allow time injection for (unit) tests, to more easily test and
verify time-dependent invariants.

Add time injection support for:

* repo.Manager
* manifest.Manager
* snapshot.Uploader

Then, wire up to these components. The content.Manager already had support for
time injection, but was not wired up from the time function passed to repo creation.

Add an internal/faketime package for testing. Mainly code movement from testing
code in the repo/content package. Motivation: make it available to other packages
outside content Also, add simple tests for faketime functions.
2020-03-10 00:42:10 -07:00
Jarek Kowalski
cd35e3bab5 cli: 'snapshot migrate' improvements to help with data migration
- cleaned up migration progress output
- fixed migration idempotency
- added migration of policies
- renamed --parallelism to --parallel
- improved e2e test
- do not prompt for password to source repository if persisted
2020-03-07 21:47:32 -08:00
Jarek Kowalski
fb181257bf cli: implemented update check, fixes #119 2020-03-04 22:06:05 -08:00
Nick
2c72fbd514 Remove FIO_USE_DOCKER env 2020-03-03 20:36:43 -08:00
Nick
b1e8773c1e Update tests/tools/fio/fio.go
Co-Authored-By: Julio López <julio+gh@kasten.io>
2020-03-03 20:36:43 -08:00
Nick
72aa2f2a97 Update tests/tools/fio/workload.go
Co-Authored-By: Julio López <julio+gh@kasten.io>
2020-03-03 20:36:43 -08:00
Nick
b98236a535 Use fio image from dockerhub
Changing image to ljishen/fio instead of building
an image in kopia.
2020-03-03 20:36:43 -08:00
Nick
c5d8c9a271 Using docker to wrap fio execution to add robustness tool tests to Travis
Update the fio runner to use a docker image if the appropriate environment variable is set. Docker image is built via a makefile target and used in the robustness tool tests.
2020-03-03 20:36:43 -08:00
Nick
27d61c038e Parallelizing snapshot fail test to decrease test time
Make each test case combination a sub-test, and running independently it in parallel. This gives a 3-4x speedup vs master in local testing.
2020-03-03 20:34:38 -08:00
Nick
8e0167027d Use bytes buffer to capture stderr instead of reading pipe
Give a `*bytes.Buffer` to the command and let `package exec` read from the pipe into the buffer for us.

The current use of the `StderrPipe()` method was problematic; the documentation for os/exec states:

> Wait will close the pipe after seeing the command exit, so most callers need not close the pipe themselves. It is thus incorrect to call Wait before all reads from the pipe have completed. For the same reason, it is incorrect to use Run when using StderrPipe.
2020-03-02 17:04:46 -08:00
Jarek Kowalski
faa7225c23 testing: better fix for ignoring ErrClosed 2020-03-01 10:53:52 -08:00
Jarek Kowalski
5e1b03dcba test: ignore ErrClose when reading from stderr to fix text flake 2020-02-29 20:07:45 -08:00
Nick
835d0b259c Fix error clobbering in kopia cli runner
Error variable was being shared between c.Output and
ioutil.ReadAll. The two could race to set err, and if
Output() finishes first with an error, ReadAll could overwrite
it with nil. Fix is to delegate an independent error for
the pipe read, and fail the test if it is non-nil.
2020-02-29 13:58:10 -08:00
Jarek Kowalski
e3854f7773 BREAKING: changed how hostname/username are handled
The hostname/username are now persisted when connecting to repository
in a local config file.

This prevents weird behavior changes when hostname is suddenly changed,
such as when moving between networks.

repo.Repository will now expose Hostname/Username properties which
are always guarnateed to be set, and are used throughout.

Removed --hostname/--username overrides when taking snapshot et.al.
2020-02-25 20:40:23 -08:00
Jarek Kowalski
c8fcae93aa logging: refactored logging
This is mostly mechanical and changes how loggers are instantiated.

Logger is now associated with a context, passed around all methods,
(most methods had ctx, but had to add it in a few missing places).

By default Kopia does not produce any logs, but it can be overridden,
either locally for a nested context, by calling

ctx = logging.WithLogger(ctx, newLoggerFunc)

To override logs globally, call logging.SetDefaultLogger(newLoggerFunc)

This refactoring allowed removing dependency from Kopia repo
and go-logging library (the CLI still uses it, though).

It is now also possible to have all test methods emit logs using
t.Logf() so that they show up in failure reports, which should make
debugging of test failures suck less.
2020-02-25 17:24:44 -08:00
Jarek Kowalski
985fc0ad12 server: fixed /objects/ path mapping, added tests 2020-02-22 19:27:10 -08:00
Jarek Kowalski
3e58911cf3 tests: de-parallelized server tests 2020-02-22 19:27:10 -08:00
Jarek Kowalski
9b50a6e891 test: increased e2e test timeout
Added linear retry support when waiting for snapshots
2020-02-22 19:27:10 -08:00
Jarek Kowalski
fde2f2e0e6 server: additional status code from CreateSnapshotSource, more tests 2020-02-22 19:27:10 -08:00
Jarek Kowalski
ab2c906f2c server: implemented remaining server API methods
CreateSnapshotSource API for ensuring source exists
Upload - starts upload on a given source or matching sources
Cancel - cancels upload on a given source or matching sources
2020-02-22 19:27:10 -08:00
Jarek Kowalski
4cb898927c server: new APIs and error codes to support UI flow for connecting to repository 2020-02-16 22:43:36 -08:00
Julio López
a9670762fd Generate shorter file names, and thus paths, in E2E tests (#229)
- Reduces name lengths by ~ 1/2
- Generate file names up to maxNameLength

Co-Authored-By: Nick <nick@kasten.io>
2020-02-14 22:25:50 -08:00
Nick
3a5d3179e7 Add WalkCrossDevice option to walk policy
Add the walk policy flag WalkCrossDevice to the fswalker Walk calls. This will avoid potential issues when running in a docker container where a FS tree is made of many overlays. Without the flag set, the Walk operation skips over files on devices that do not match the device at the base path.
2020-02-14 21:18:41 -08:00
Jarek Kowalski
8bdd5caf7e server: added retry loop when launching server 2020-02-13 17:23:50 -08:00
Jarek Kowalski
0f79279f5e server: added support for new verbs in the API
/api/v1/repo/create
/api/v1/repo/connect
/api/v1/repo/disconnect

Refactored server code and fixed a number of outstanding robustness
issues. Tweaked the API responses a bit to make more sense when consumed
by the UI.
2020-02-13 17:23:50 -08:00
Nick
3cb7e37fc1 Add comparer interface and fswalker implementation (#226)
Add comparer interface which gathers data on a path and
compares that data to a new path, returning error if the path
differs in any way from the input data. The details of what
constitutes a difference is left to the implementation.

FSWalker implementation uses Walk and Report to do the data
gathering and comparison. Filters are applied to sort out any
differences that might be expected (e.g. ctime, atime, mtime,
rename of root directory after restore).
2020-02-13 17:07:18 -08:00
Jarek Kowalski
4736e9037e revamped progress output and cleaned up logging
See https://asciinema.org/a/ykx6uzEhKY3451fWEnX9nm9uo
2020-02-10 19:08:35 -08:00