Commit Graph

174 Commits

Author SHA1 Message Date
Jarek Kowalski
f517703079 Preliminary support for sessions (#752)
* content: fixed time-based auto-flush behavior to behave like Flush()

Previously it would sometimes be possible for a content whose write
started before time-based flush to finish writing afterwards (and it
would be included in the new index).

Refactored the code so that time-based flush happens before WriteContent
write and behaves exactly the same was as real Flush() so all writes
started before it will be awaited during the flush.

Also previous regression test was incorrect since it was mocking the
wrong blob method.

* content: refactored index blob manager crypto to separate file

This will be reused for encrypting session info.

* content: added support for session markers

Session marker (`s` blob) is written BEFORE the first data blob
(`p` or `q`) that belongs to new index segment (`n` is written).

Session marker is removed AFTER the index blob (`n`) has been written.

All pack and index blobs belonging to a session will have the session
ID as its suffix, so that if a reader can see `s<sessionID>` blob, they
will ignore any `p` and `q` blobs with the same suffix.

* maintenance: ignore blobs belonging to active sessions when running blob garbage collection

* cli: added 'sessions list' for listing active sessions

*  content: added retrying writing previously failed blobs before writing new one
2021-01-14 00:25:51 -08:00
Peter Palotas
ae37719e51 Improved .kopiaignore pattern matching (#773)
* Improved .kopiaignore pattern matching

.kopiaignore pattern matching now (hopefully) conforms to the .gitignore specification (https://git-scm.com/docs/gitignore)

Replaced old package "ignore" with a newly written "wcmatch" that manages the globbing. This should support all the patterns that .gitignore supports.  

Some changes in ignorefs that dealt with how the patterns were matched.

This fixes #571

* Fixed invalid matching of non-rooted patterns that contained a slash.

If a pattern contains a slash in the middle of the pattern this should only match relative to the .gitignore file, i.e. the same as if it started with a '/' according to the .gitignore spec.

Example:
foo/bar should match "/foo/bar", but not "/other/foo/bar".  
whereas 
"bar" matches both "/bar" and "/foo/bar"

* Uncommented previously failing tests.

* Fixed problem with matching "nested" .kopiaignore files.

Ignore-patterns must be applied from the root .kopiaignore down the hierarchy, so that an ignore file in a subdirectory can negate a pattern from a parent directory.

* Uncommented tests that should now work.
2021-01-08 08:13:18 -08:00
Peter Palotas
cd8f3e81b8 Created end-to-end tests verifying .kopiaignore behavior. (#774)
* Created end-to-end tests verifying .kopiaignore behavior.

This is related to #571 and #773, but provided as a separate PR to include tests that did not work before PR #773.

* Commented failing tests.

These tests will be re-enabled when #773 is done.

* Added additional commented tests of .kopiaignore

These will be uncommented in #773.
2021-01-08 07:39:59 -08:00
Jarek Kowalski
8e17edcdf6 content: mechanical refactoring of content manager to extract CommittedReadManager (#771)
* content: introduced ContentReadManager
  This only introduces new type and reassigns methods
* content: moved all CommittedReadManager methods to one file
* content: more code movement
* content: refactored read manager setup
* content: refactored test hook that allowed cleaner passing of custom own writes cache
2021-01-07 00:39:20 -08:00
Jarek Kowalski
f1b471d7e6 Fixes for test flakes (#770)
* testing: prevented spurious test flakes caused by kopia subprocesses messing with stderr

This was not causing actual failures, but misreporting error messages.

* testing: ensure random names are always unique by adding a counter
2021-01-05 21:37:23 -08:00
Jarek Kowalski
207009939f cli: only fetch the persisted password from keychain if one was not provided on the command line (#744)
This also fixed a test bug where the test was incorrectly passing
password via environment variable and it was (incorrectly) expected
to be ignored.

Password is determined in the following order:

- flag/environment variable (highest priority)
- persistent storage
- asking user (lowest priority)
2020-12-24 22:39:02 -08:00
Jarek Kowalski
e03971fc59 Upgraded linter to v1.33.0 (#734)
* linter: upgraded to 1.33, disabled some linters

* lint: fixed 'errorlint' errors

This ensures that all error comparisons use errors.Is() or errors.As().
We will be wrapping more errors going forward so it's important that
error checks are not strict everywhere.

Verified that there are no exceptions for errorlint linter which
guarantees that.

* lint: fixed or suppressed wrapcheck errors

* lint: nolintlint and misc cleanups

Co-authored-by: Julio López <julio+gh@kasten.io>
2020-12-21 22:39:22 -08:00
Jarek Kowalski
246dcf80ba testing: added 'snapshot verify' test 2020-12-21 20:02:53 -08:00
Jarek Kowalski
d7ca543356 cli: improvements to 'snapshot verify'
* When running against direct repository, it will verify that all
  backing blobs exist based on results of listing.
* Deprecated annoying --all-sources flag which is now default if no
  sources are provided.
2020-12-21 20:02:53 -08:00
Jarek Kowalski
eecd9d13c9 actions: Added --enable-actions flag (#737)
This can be specified at `repo create` or `repo connect` to enable
actions. By default actions are disabled to avoid security risks
associated with executing code.

Alternatively during `snapshot create` one can specify
`--force-enable-actions` or `--force-disable-actions`
2020-12-21 18:05:25 -08:00
Jarek Kowalski
4f7d211f72 Added support for actions that run before&after snapshot roots and before/after specific folders (#722)
* policy: add actions
* fs: added LocalFilesystemPath() which can optionally return local filesystem
  path (if entry is local)
* cli: added support for setting policy actions
* upload: support for executing actions before/after folder (non-inheritable)
  and before/after snapshots (inheritable)
* testing: end-to-end test for actions
* additional tests for actions with embedded scripts
2020-12-21 15:53:21 -08:00
Julio López
3795ffc6f9 robustness: minor cleanups (#726)
Remove unnecessary intermediate variables.
Send SIGTERM instead of SIGKILL to terminate child kopia server process.
Set Pdeathsig on Linux for child kopia server process.
Trivial: reduce scope of hostFioDataPathStr variable.
Trivial: rename local variable.
Trivial: Use log.Fatalln instead of log + exit(1).
Improve error message in robustness test to tell apart failure cause.
2020-12-16 12:49:54 -08:00
Julio López
35346863d2 robustness: add support for kopia server storage repo (#720)
Adds single client to server robustness tests.

Co-authored-by: Rahul M Chheda <rchheda@infracloud.io>
2020-12-15 17:56:30 -08:00
Jarek Kowalski
ad4b222939 cli: added support for copying (or moving) snapshot history (#703)
Both source and destination can be specified using user@host,
@host or user@host:/path where destination values override the
corresponding parts of the source, so both targeted
and mass copying is supported.

Supported combinations are:

Source:             Destination         Behavior
---------------------------------------------------
@host1              @host2              copy snapshots from all users of host1
user1@host1         @host2              copy all snapshots to user1@host2
user1@host1         user2@host2         copy all snapshots to user2@host2
user1@host1:/path1  @host2              copy to user1@host2:/path1
user1@host1:/path1  user2@host2         copy to user2@host2:/path1
user1@host1:/path1  user2@host2:/path2  copy snapshots from single path

When --move is specified, the matching source snapshots are also deleted.

* cli: upgraded kingpin to latest version (not tagged)

This allows using `EnableFileExpansion` to disable treating
arguments prefixed with "@" as file includes.
2020-12-04 16:34:55 -08:00
Jarek Kowalski
e8714cb2a1 s3: upgraded to minio v7 (#707) 2020-12-02 19:46:05 -08:00
Nick
71dcbcf2e3 Robustness engine actions with stats and log (#685)
* Robustness engine actions with stats and logging

- Add actions to robustness engine
- Actions wrap other functional behavior and serve as a common interface for collecting stats
- Add stats for the engine, both per run and cumulative over time
- Add a log for actions that the engine has executed
- Add recovery logic to re-sync snapshot metadata after a possible failed engine run (e.g. if metadata wasn't properly persisted).

Current built-in actions:
- snapshot root directory
- restore random snapshot ID into a target restore path
- delete a random snapshot ID
- run GC
- write random files to the local data directory
- delete a random subdirectory under the local data directory
- delete files in a directory
- restore a snapshot ID into the local data directory

Actions are executed according to a set of options, which dictate the relative probabilities of picking a given action, along with ranges for action-specific parameters that can be randomized.
2020-11-17 01:07:04 -08:00
Jarek Kowalski
4d7f0cb6cd Fixed symlink restore behavior on macOS (#673)
* restore: use symlink-specific APIs instead of chmod, chown and chtimes

* upload: fix updating directory modtime for symlinks

* cli: plumbed through flags to restore to control new behaviors

* localfs: use Lstat() instead of Stat() in Child() method

* testing: added restore tests for new flags
2020-10-10 11:03:35 -07:00
Jarek Kowalski
1962882aa8 testing: use shorter RSA keys to speed up server tests (#665) 2020-10-04 22:06:59 -07:00
Jarek Kowalski
ae38fa3917 Speed up integration tests (#653)
* testing: don't use expensive scrypt-65536-8-1 in integration tests

* testing: use platform-specific encryption and hashing for arm and arm64 to speed up tests

* testing: manually manage log directory to be able to analyze integration test failures

* testing: snapshot_gc_test was too quick

* Makefile: renamed target building integration test binary
2020-09-30 22:01:16 -07:00
Jarek Kowalski
0758a92c58 restore: improved user experience (#644)
* restore: improved user experience

* 'snapshot restore' is now the same as 'restore' and both will
  support restoring by manifest ID, root ID or root ID + subdirectory

* added support for restoring individual files

* implemented PR feedback and refactored object ID parsing

Moving helpers inside the snapshot/ package helped clean up the code
a lot.
2020-09-28 22:57:24 -07:00
Jarek Kowalski
c9c8d27c8d Repro and fix for zero-sized snapshot bug (#641)
* server: repro for zero-sized snapshot bug

As described in https://kopia.discourse.group/t/kopia-0-7-0-not-backing-up-any-files-repro-needed/136/5

* server: fixed zero-sized snapshots after repository is connected via API

The root cause was that source manager was inheriting HTTP call context
which was immediately closed after the 'connect' RPC returned thus
silently killing all uploads.
2020-09-23 20:15:36 -07:00
Julio López
ae6a960080 Prefer t.TempDir() over makeScratchDir(t) (#612)
Prefer t.TempDir() over makeScratchDir(t)
Remove unused randomString
Leverage T.TempDir() in CLITest env
2020-09-22 22:16:39 -07:00
Jarek Kowalski
6bdcb81712 ignorefs: fixed arm-specific linter warning (#637)
* ignorefs: fixed arm-specific linter warning

* testing: TestServerStart fixes for armhf
2020-09-22 19:04:05 -07:00
Jarek Kowalski
fce9497375 restore: support for symlinks (experimental) (#621) 2020-09-18 10:29:20 -07:00
Nick
7f61dc6637 [Robustness] Add command line parameters for kopia snapshotter (#576)
* [Robustness] Add command line parameters for kopia snapshotter

Add flags for:
- no-progress
- parallel
- cache sizes
- no update check

Add an integration test to validate snapshotter expected output
against a kopia executable.
2020-09-18 01:15:19 -07:00
Jarek Kowalski
f2cf71d914 logging: revamped logs from content manager to be machine parseable (#617)
* logging: revamped logs from content manager to be machine parseable

Logs from the content manager (except reads) are sent to separate log
file that is always free from personally-identifiable information
(e.g. no file names, just content IDs and blob IDs).

Also moved CLI logs to a subdirectory (cli-logs) and put content logs
in a parallel directory (content-logs)

Also, the log file name will now include the type of the command that
was invoked:

   kopia-20200913-134157-16110-snapshot-create.log

Fixes #588

* tests: moved all logs from tests to a separate directory
2020-09-16 20:04:26 -07:00
Jarek Kowalski
c7be3a0c87 testing: added performance benchmark (#618)
The benchmarks creates 20 GB of files in different configurations

* 10 x 2 GB files
* 100 x 200 MB files
* 1000 x 20 MB files

and backs them up to a local filesystem repository measuring time,
CPU and RAM usage.

The benchmarking script uses GCP instance (n1-standard-8) with fast NVME
flash to eliminate local filesystem latency.

Current performance numbers show major improvement in latency in
0.7.0-rc1 due to splitter throughput optimization (#606).
2020-09-15 21:30:08 -07:00
Jarek Kowalski
6a14ac8a2a cli: ensure advanced commands are not accidentally used (#611)
* cli: ensure advanced commands are not accidentally used

This prints an error when a dangerous command is used without
first setting KOPIA_ADVANCED_COMMANDS=enabled environment variable.

Co-authored-by: Julio López <julio+gh@kasten.io>
2020-09-12 20:31:25 -07:00
Julio López
64b6018140 Test for directory reuse after GC (#601)
content:Allow returning deleted content in GetContent
maintenance: check deleted contents as well
maintenance: test for when a directory content is reused after deletion

testing: add support for repo open options in repotesting
* Allow passing repo options to MustReopen
* Add repotesting.Environment.MustConnectOpenAnother
* Remove kopia.config.mlock file
* snapshot create helper
* Fix content delete related and e2e tests
2020-09-12 19:28:52 -07:00
Jarek Kowalski
29ce1819cb Added support for setting and changing repository client options (description, read-only, hostname, username) (#589)
* repo: refactored client-specific options (hostname,username,description,readonly) into new struct that is JSON-compatible with current config

* cli: added 'repository set-client' to configure parameters of connected repository

* cli: cleaned up 'repository status' output
2020-09-04 13:57:15 -07:00
Jarek Kowalski
ded1ecf936 implemented Cache Directory Tagging Specification + CLI + UI (#565)
Fixes #564

cli: added 'kopia policy set --ignore-cache-dirs' option to control
whether to ignore caches (global default=true)

ui: added checkbox to control 'Ignore Cache Dirs' in policy editor

ignorefs: moved ignoring cache directories to ignorefs layer

Co-authored-by: Julio López <julio+gh@kasten.io>
2020-08-31 21:35:26 -07:00
Jarek Kowalski
c242235a32 blob: added SetTime() method which may be optionally implemented by blob.Storage (#575)
cli: added --times option to 'repository sync'
2020-08-31 19:50:15 -07:00
Jarek Kowalski
965160dba1 cli: ignore trailing / in repository server URL (#569)
Fixes #557
2020-08-30 16:10:26 -07:00
Jarek Kowalski
1a8fcb086c Added endurance test which tests kopia over long time scale (#558)
Globally replaced all use of time with internal 'clock' package
which provides indirection to time.Now()

Added support for faking clock in Kopia via KOPIA_FAKE_CLOCK_ENDPOINT

logfile: squelch annoying log message

testenv: added faketimeserver which serves time over HTTP

testing: added endurance test which tests kopia over long time scale

This creates kopia repository and simulates usage of Kopia over multiple
months (using accelerated fake time) to trigger effects that are only
visible after long time passage (maintenance, compactions, expirations).

The test is not used part of any test suite yet but will run in
post-submit mode only, preferably 24/7.

testing: refactored internal/clock to only support injection when
'testing' build tag is present
2020-08-26 23:03:46 -07:00
Nick
e7675f2d01 Address additional suggestions from fio workload PR #529 (#550)
Followup on recent PR #529, some suggestions and discussion after it was merged:

- Express probability as float in range [0,1]
- Add a unit test for DeleteContentsAtDepth
- Add a comment on writeFilesAtDepth explaining depth vs branchDepth
- Refactor pickRandSubdirPath for easier readability and understanding

Upon some reflection, I decided to refactor pickRandSubdirPath() to gather indexes and pick randomly from them instead of the previous reservoir sampling approach. I think this is easier to understand going forward without extra explanation, doesn't have much additional memory overhead, and reduces the number of rand calls to 1.
2020-08-20 21:10:56 -07:00
Nick
da6b933542 [Robustness] Add additional fio workloads and fix fio runner (#529)
* [Robustness] Add additional fio workloads

Add more fio workloads to write files at different depths in random
branches of the generated file system tree.

- Write files at depth
- Write files at a specified depth, creating a new directory branch at
a random depth
- Delete a random directory at a given depth
- Delete some or all of the contents of a random directory at
a specified depth
2020-08-14 21:54:52 -07:00
Nick
14d50aaa50 [Robustness] Fix for kopia runner and custom working directory (#533)
* [Robustness] Fix for kopia runner and custom work dir

Apply fix similar to #293 for the robustness kopia runner.
Add control for runner working directory.
2020-08-14 17:32:45 -07:00
Nick
0c3ab1337e [Robustness] Fswalker should ignore host name (#531)
Fix fswalker to ignore hostname to allow reporting
on walks done across different hosts. Also prevent
Before and After walk data from printing to reduce log size.
2020-08-13 16:40:23 -07:00
Nick
7da4022cfd [refactor] Move robustness and engine packages (#528)
Perform minor refactor by moving robustness and engine packages
in preparation for later PRs

Fix import path
2020-08-13 14:56:44 -07:00
Jarek Kowalski
9a6dea898b Linter upgrade to v1.30.0 (#526)
* fixed godot linter errors
* reformatted source with gofumpt
* disabled some linters
* fixed nolintlint warnings
* fixed gci warnings
* lint: fixed 'nestif' warnings
* lint: fixed 'exhaustive' warnings
* lint: fixed 'gocritic' warnings
* lint: fixed 'noctx' warnings
* lint: fixed 'wsl' warnings
* lint: fixed 'goerr113' warnings
* lint: fixed 'gosec' warnings
* lint: upgraded linter to 1.30.0
* lint: more 'exhaustive' warnings

Co-authored-by: Nick <nick@kasten.io>
2020-08-12 19:28:53 -07:00
Jarek Kowalski
bb2434dc28 upload: auto-ignore kopia cache directories when creating snapshots (#524)
This creates a marker file named `.kopia-cache` in the directory
that is the root of cache. When the uploader finds this file, it will
treat the entire directory as if it were empty.

This allows excluding directory caches from entire home and root
directories.
2020-08-10 11:19:54 -07:00
Jarek Kowalski
505ab92e21 Support for repository sync (#522)
* blob: added DisplayName() method to blob.Storage

* cli: added 'kopia repo sync-to <provider>' which replicates BLOBs

Usage demo: https://asciinema.org/a/352299

Fixes #509

* implemented suggestion by Ciantic to fail sync if the destination repository is not compatible with the source

* cli: added 'kopia repo sync --must-exist'

This ensures that target repository is not empty, otherwise syncing to
an accidentally unmounted filesystem directory might copy everything
again.
2020-08-09 12:36:41 -07:00
Nick
7867513732 [Fix #519] Fix pack blob parse in TestRestoreFail (#520)
Fix parsing of pack blob ID by using a specific regex
instead of a strings.Contains. This prevents the test from
misidentifying other blob IDs as pack blobs, such as
`kopia.maintenance`.
2020-08-05 18:31:48 -07:00
Jarek Kowalski
40acf238f3 Fixed arm and arm64 build. (#506)
* fixed a number of cases where misaligned data was causing panics on armv7 (but not armv8)
* travis: enable arm64
* test: reduce compressed data sizes when running on arm
* arm: wait longer for snapshots
2020-07-30 17:31:28 -07:00
Jarek Kowalski
8ead49b779 restore: support for zip, tar and tar.gz restore outputs (#482)
* restore: support for zip, tar and tar.gz restore outputs

Moved restore functionality to its own package.

* Fix enum values in the 'mode' flag

Co-authored-by: Julio López <julio+gh@kasten.io>
2020-07-22 22:56:11 -07:00
Nick
ce5e6dcd13 [Robustness] Add first robustness tests
Add two tests:
- TestManySmallFiles: writes 100k files size 4k to a directory. Snapshots the data tree, restores and validates data.
- TestModifyWorkload: Loops over a simple randomized workload. Performs a series of random file writes to some random sub-directories, then takes a snapshot of the data tree. All snapshots taken during this test are restore-verified at the end.

A global test engine is instantiated in main_test.go, to be used in the robustness test suite across tests (saves time loading/saving metadata once per run instead of per test).
2020-07-14 22:37:11 -07:00
Nick
82a2fa0ea5 [Robustness] Add test engine to manage snapshot verification testing (#468)
* Add test engine to manage snapshot verification testing

Test engine manages the test and metadata repositories, snapshot
checker, metadata storage persistence, and file writer. It is
the high level helper that will be invoked in the snapshot
verification testing suite.

- modify data directory file structure
- issue snapshot/restore/delete to the data directory
- accumulate metadata over the course of the test suite
- flush accumulated metadata to the metadata repository
- load historical metadata from the repository on initialization
- perform automatic data integrity verification on snap restore

This change corresponds to the robustness execution engine component from the design documentation.
2020-06-27 23:46:37 -07:00
Jarek Kowalski
79757672ca server: implemented 'flush' and 'refresh' API
Added test that verifies that when client performs Flush (which happens
at the end of each snapshot and when repository is closed), the
server writes new blobs to the storage.

Fixes #464
2020-06-07 19:38:13 -07:00
Nick
b30da511e7 Wire up snapshot-store-compare behavior
Connect the snapshotter, the storer, and the comparer. Invoke
the snapshotter to take/restore/delete snapshots on the repo,
the comparer to gather metadata before the snapshot and
after the restore, and the storer to save metadata for later
lookup when verifying restores.
2020-05-31 21:12:31 -07:00
Jarek Kowalski
d68273a576 Improvements for dealing with eventually-consistent stores (S3) (#437)
* content: added support for cache of own writes

Thi keeps track of which blobs (n and m) have been written by the
local repository client, so that even if the storage listing
is eventually consistent (as in S3), we get somewhat sane behavior.

Note that this is still assumming read-after-create semantics, which
S3 also guarantees, otherwise it's very hard to do anything useful.

* compaction: support for compaction logs

Instead of compaction immediately deleting source index blobs, we now
write log entries (with `m` prefix) which are merged on reads
and applied only if the blob list includes all inputs and outputs, in
which case the inputs are discarded since they are known to have been
superseded by the outputs.

This addresses eventual consistency issues in stores such as S3,
which don't guarantee list-after-put or list-after-delete. With such
stores the repository is ultimately eventually consistent and there's
not much that can be done about it, unless we use second strongly
consistent storage (such as GCS) for the index only.

* content: updated list cache to cache both `n` and `m`

* repo: fixed cache clear on windows

Clearing cache requires closing repository first, as Windows is holding
the files locked.

This requires ability to close the repository twice.

* content: refactored index blob management into indexBlobManager

* testing: fixed blobtesting.Map storage to allow overwrites

* blob: added debug output String() to blob.Metadata

* testing: added indexBlobManager stress test

This works by using N parallel "actors", each repeatedly performing
operations on indexBlobManagers all sharing single eventually consistent
storage.

Each actor runs in a loop and randomly selects between:

- *reading* all contents in indexes and verifying that it includes
  all contents written by the actor so far and that contents are
  correctly marked as deleted
- *creating* new contents
- *deleting* one of previously-created contents (by the same actor)
- *compacting* all index files into one

The test runs on accelerated time (every read of time moves it by 0.1
seconds) and simulates several hours of running.

In case of a failure, the log should provide enough debugging
information to trace the exact sequence of events leading up to the
failure - each log line is prefixed with actorID and all storage
access is logged.

* makefile: increase test timeout

* content: fixed index blob manager race

The race is where if we delete compaction log too early, it may lead to
previously deleted contents becoming temporarily live again to an
outside observer.

Added test case that reproduces the issue, verified that it fails
without the fix and passed with one.

* testing: improvements to TestIndexBlobManagerStress test

- better logging to be able to trace the root cause in case of a failure
- prevented concurrent compaction which is unsafe:

The sequence:

1. A creates contentA1 in INDEX-1
2. B creates contentB1 in INDEX-2
3. A deletes contentA1 in INDEX-3
4. B does compaction, but is not seeing INDEX-3 (due to EC or simply
   because B started read before #3 completed), so it writes
   INDEX-4==merge(INDEX-1,INDEX-2)
   * INDEX-4 has contentA1 as active
5. A does compaction but it's not seeing INDEX-4 yet (due to EC
   or because read started before #4), so it drops contentA1, writes
   INDEX-5=merge(INDEX-1,INDEX-2,INDEX-3)
   * INDEX-5 does not have contentA1
7. C sees INDEX-5 and INDEX-5 and merge(INDEX-4,INDEX-5)
   contains contentA1 which is wrong, because A has been deleted
   (and there's no record of it anywhere in the system)

* content: when building pack index ensure index bytes are different each time by adding 32 random bytes
2020-05-31 17:11:20 -07:00