* content: fixed time-based auto-flush behavior to behave like Flush()
Previously it would sometimes be possible for a content whose write
started before time-based flush to finish writing afterwards (and it
would be included in the new index).
Refactored the code so that time-based flush happens before WriteContent
write and behaves exactly the same was as real Flush() so all writes
started before it will be awaited during the flush.
Also previous regression test was incorrect since it was mocking the
wrong blob method.
* content: refactored index blob manager crypto to separate file
This will be reused for encrypting session info.
* content: added support for session markers
Session marker (`s` blob) is written BEFORE the first data blob
(`p` or `q`) that belongs to new index segment (`n` is written).
Session marker is removed AFTER the index blob (`n`) has been written.
All pack and index blobs belonging to a session will have the session
ID as its suffix, so that if a reader can see `s<sessionID>` blob, they
will ignore any `p` and `q` blobs with the same suffix.
* maintenance: ignore blobs belonging to active sessions when running blob garbage collection
* cli: added 'sessions list' for listing active sessions
* content: added retrying writing previously failed blobs before writing new one
* Improved .kopiaignore pattern matching
.kopiaignore pattern matching now (hopefully) conforms to the .gitignore specification (https://git-scm.com/docs/gitignore)
Replaced old package "ignore" with a newly written "wcmatch" that manages the globbing. This should support all the patterns that .gitignore supports.
Some changes in ignorefs that dealt with how the patterns were matched.
This fixes#571
* Fixed invalid matching of non-rooted patterns that contained a slash.
If a pattern contains a slash in the middle of the pattern this should only match relative to the .gitignore file, i.e. the same as if it started with a '/' according to the .gitignore spec.
Example:
foo/bar should match "/foo/bar", but not "/other/foo/bar".
whereas
"bar" matches both "/bar" and "/foo/bar"
* Uncommented previously failing tests.
* Fixed problem with matching "nested" .kopiaignore files.
Ignore-patterns must be applied from the root .kopiaignore down the hierarchy, so that an ignore file in a subdirectory can negate a pattern from a parent directory.
* Uncommented tests that should now work.
* trivial: move CachingOptions out of content.Manager, where it's not needed
* trivial: removed newManagerWithOptions which was the same as NewManager
also moved one-time initialization to newReadManager()
* linter: upgraded to 1.33, disabled some linters
* lint: fixed 'errorlint' errors
This ensures that all error comparisons use errors.Is() or errors.As().
We will be wrapping more errors going forward so it's important that
error checks are not strict everywhere.
Verified that there are no exceptions for errorlint linter which
guarantees that.
* lint: fixed or suppressed wrapcheck errors
* lint: nolintlint and misc cleanups
Co-authored-by: Julio López <julio+gh@kasten.io>
* policy: add actions
* fs: added LocalFilesystemPath() which can optionally return local filesystem
path (if entry is local)
* cli: added support for setting policy actions
* upload: support for executing actions before/after folder (non-inheritable)
and before/after snapshots (inheritable)
* testing: end-to-end test for actions
* additional tests for actions with embedded scripts
Both source and destination can be specified using user@host,
@host or user@host:/path where destination values override the
corresponding parts of the source, so both targeted
and mass copying is supported.
Supported combinations are:
Source: Destination Behavior
---------------------------------------------------
@host1 @host2 copy snapshots from all users of host1
user1@host1 @host2 copy all snapshots to user1@host2
user1@host1 user2@host2 copy all snapshots to user2@host2
user1@host1:/path1 @host2 copy to user1@host2:/path1
user1@host1:/path1 user2@host2 copy to user2@host2:/path1
user1@host1:/path1 user2@host2:/path2 copy snapshots from single path
When --move is specified, the matching source snapshots are also deleted.
* cli: upgraded kingpin to latest version (not tagged)
This allows using `EnableFileExpansion` to disable treating
arguments prefixed with "@" as file includes.
The new files policy oneFileSystem ignores files that are mounted to
other filesystems similarly to tar's --one-file-system switch. For
example, if this is enabled, backing up / should now automatically
ignore /dev, /proc, etc, so the directory entries themselves don't
appear in the backup. The value of the policy is 'false' by default.
This is implemented by adding a non-windows-field Device (of type
DeviceInfo, reflecting the implementation of Owner) to the Entry
interface. DeviceInfo holds the dev and rdev acquired with stat (same
way as with Owner), but in addition to that it also holds the same
values for the parent directory. It would seem that doing this in some
other way, ie. in ReadDir, would require modifying the ReadDir
interface which seems a too large modification for a feature this
small.
This change introduces a duplication of 'stat' call to the files, as
the Owner feature already does a separate call. I doubt the
performance implications are noticeable, though with some refactoring
both Owner and Device fields could be filled in in one go.
Filling in the field has been placed in fs/localfs/localfs.go where
entryFromChildFileInfo has acquired a third parameter giving the the
parent entry. From that information the Device of the parent is
retrieved, to be passed off to platformSpecificDeviceInfo which does
the rest of the paperwork. Other fs implementations just put in the
default values.
The Dev and Rdev fields returned by the 'stat' call have different
sizes on different platforms, but for convenience they are internally
handled the same. The conversion is done with local_fs_32bit.go and
local_fs_64bit.go which are conditionally compiled on different
platforms.
Finally the actual check of the condition is in ignorefs.go function
shouldIncludeByDevice which is analoguous to the other similarly named
functions.
Co-authored-by: Erkki Seppälä <flux@inside.org>
* logging: cleaned up stderr logging
- do not show module
- do not show timestamps by default (enable with --console-timestamps)
* logging: replaced most printStderr() with log.Info
* cli: additional logging cleanup
By default Kopia will emit colored output if the output is a non-dumb
terminal. You can use --force-color (or set environment
variable to KOPIA_FORCE_COLOR=true) to override it and emit ANSI color
sequences, which is useful for example when piping through 'less'.
Conversely --disable-color (or KOPIA_DISABLE_COLOR=true)
will prevent color output.
* object: fixed race condition between Result() and Checkpoint()
This would sometimes result in indirect objects having empty object IDs.
Fixes#648
* upload: ensure checkpoints never containt empty object IDs.
* testing: reduce armhf test weight
* restore: improved user experience
* 'snapshot restore' is now the same as 'restore' and both will
support restoring by manifest ID, root ID or root ID + subdirectory
* added support for restoring individual files
* implemented PR feedback and refactored object ID parsing
Moving helpers inside the snapshot/ package helped clean up the code
a lot.
* server: repro for zero-sized snapshot bug
As described in https://kopia.discourse.group/t/kopia-0-7-0-not-backing-up-any-files-repro-needed/136/5
* server: fixed zero-sized snapshots after repository is connected via API
The root cause was that source manager was inheriting HTTP call context
which was immediately closed after the 'connect' RPC returned thus
silently killing all uploads.
Changed file read implementation from ReadAll() to a Handle to avoid OOMing
We don't have automated tests for this but I verified this by restoring
13GB file over fuse and memory usage never exceeded 400MB.
* logging: revamped logs from content manager to be machine parseable
Logs from the content manager (except reads) are sent to separate log
file that is always free from personally-identifiable information
(e.g. no file names, just content IDs and blob IDs).
Also moved CLI logs to a subdirectory (cli-logs) and put content logs
in a parallel directory (content-logs)
Also, the log file name will now include the type of the command that
was invoked:
kopia-20200913-134157-16110-snapshot-create.log
Fixes#588
* tests: moved all logs from tests to a separate directory
content:Allow returning deleted content in GetContent
maintenance: check deleted contents as well
maintenance: test for when a directory content is reused after deletion
testing: add support for repo open options in repotesting
* Allow passing repo options to MustReopen
* Add repotesting.Environment.MustConnectOpenAnother
* Remove kopia.config.mlock file
* snapshot create helper
* Fix content delete related and e2e tests
* cli: added --tls-print-server-cert flag
This prints complete server certificate that is base64 and PEM-encoded.
It is needed for Electron to securely connect to the server outside of
the browser, since there's no way to trust certificate by fingerprint.
* server: added repo/exists API
* server: added ClientOptions to create and connect API
* server: exposed current-user API
* server: API to change description of a repository
* htmlui: refactored connect/create flow
This cleaned up the code a lot and made UX more obvious.
* kopia-ui: simplified repository management UX
Removed repository configuration window which was confusing due to
the notion of 'server'.
Now KopiaUI will automatically launch 'kopia server --ui' for each
config found in the kopia config directory and shut it down every
time repository is disconnected.
See https://youtu.be/P4Ll_LR4UVM for a quick demo.
Fixes#583
* repo: refactored client-specific options (hostname,username,description,readonly) into new struct that is JSON-compatible with current config
* cli: added 'repository set-client' to configure parameters of connected repository
* cli: cleaned up 'repository status' output
* cli: simplified mount command
See https://youtu.be/1Nt_HIl-NWQ
It will always use WebDAV on Windows and FUSE on Unix. Removed
confusing options.
New usage:
$ kopia mount [--browse]
Mounts all snapshots in a temporary filesystem directory
(both Unix and Windows).
$ kopia mount <object> [--browse]
Mounts given object in a temporary filesystem directory
(both Unix and Windows).
$ kopia mount <object> z: [--browse]
Mounts given object as a given drive letter in Windows (using
temporary WebDAV mount).
$ kopia mount <object> * [--browse]
Mounts given object as a random drive letter in Windows.
$ kopia mount <object> /mount/path [--browse]
Mounts given object in given path in Unix.
<object> can be the ID of a directory 'k<hash>' or 'all'
Optional --browse automatically opens OS-native file browser.
* htmlui: added UI for mounting directories
See https://youtu.be/T-9SshVa1d8 for a quick demo.
Also replaced some UI text with icons.
* lint: windows-specific fix
Globally replaced all use of time with internal 'clock' package
which provides indirection to time.Now()
Added support for faking clock in Kopia via KOPIA_FAKE_CLOCK_ENDPOINT
logfile: squelch annoying log message
testenv: added faketimeserver which serves time over HTTP
testing: added endurance test which tests kopia over long time scale
This creates kopia repository and simulates usage of Kopia over multiple
months (using accelerated fake time) to trigger effects that are only
visible after long time passage (maintenance, compactions, expirations).
The test is not used part of any test suite yet but will run in
post-submit mode only, preferably 24/7.
testing: refactored internal/clock to only support injection when
'testing' build tag is present
This will launch 'rclone webdav server' passing random TLS
certificate and username/password and serve predefined rclone
remote path.
This is very experimental, use with caution.
Fixes#313.
Additional / required changes:
* blob: (experimental) support for rclone provider
* server: refactored TLS utilities to separate package
* webdav: add support for specifying trusted TLS certificate fingerprint
* kopia-ui: added rclone support
* kopia-ui: added ability to connect to kopia server
* kopia-ui: update status page to show some data for repositories connected to API server
* kopia-ui: hide user@host selection dropdown for kopia server repositories
* blob: added DisplayName() method to blob.Storage
* cli: added 'kopia repo sync-to <provider>' which replicates BLOBs
Usage demo: https://asciinema.org/a/352299Fixes#509
* implemented suggestion by Ciantic to fail sync if the destination repository is not compatible with the source
* cli: added 'kopia repo sync --must-exist'
This ensures that target repository is not empty, otherwise syncing to
an accidentally unmounted filesystem directory might copy everything
again.
* fixed a number of cases where misaligned data was causing panics on armv7 (but not armv8)
* travis: enable arm64
* test: reduce compressed data sizes when running on arm
* arm: wait longer for snapshots
Added test that verifies that when client performs Flush (which happens
at the end of each snapshot and when repository is closed), the
server writes new blobs to the storage.
Fixes#464
instead moved to run as part of maintenance ('kopia maintenance run')
added 'kopia maintenance run --force' flag which runs maintenance even
if not owned
* content: added support for cache of own writes
Thi keeps track of which blobs (n and m) have been written by the
local repository client, so that even if the storage listing
is eventually consistent (as in S3), we get somewhat sane behavior.
Note that this is still assumming read-after-create semantics, which
S3 also guarantees, otherwise it's very hard to do anything useful.
* compaction: support for compaction logs
Instead of compaction immediately deleting source index blobs, we now
write log entries (with `m` prefix) which are merged on reads
and applied only if the blob list includes all inputs and outputs, in
which case the inputs are discarded since they are known to have been
superseded by the outputs.
This addresses eventual consistency issues in stores such as S3,
which don't guarantee list-after-put or list-after-delete. With such
stores the repository is ultimately eventually consistent and there's
not much that can be done about it, unless we use second strongly
consistent storage (such as GCS) for the index only.
* content: updated list cache to cache both `n` and `m`
* repo: fixed cache clear on windows
Clearing cache requires closing repository first, as Windows is holding
the files locked.
This requires ability to close the repository twice.
* content: refactored index blob management into indexBlobManager
* testing: fixed blobtesting.Map storage to allow overwrites
* blob: added debug output String() to blob.Metadata
* testing: added indexBlobManager stress test
This works by using N parallel "actors", each repeatedly performing
operations on indexBlobManagers all sharing single eventually consistent
storage.
Each actor runs in a loop and randomly selects between:
- *reading* all contents in indexes and verifying that it includes
all contents written by the actor so far and that contents are
correctly marked as deleted
- *creating* new contents
- *deleting* one of previously-created contents (by the same actor)
- *compacting* all index files into one
The test runs on accelerated time (every read of time moves it by 0.1
seconds) and simulates several hours of running.
In case of a failure, the log should provide enough debugging
information to trace the exact sequence of events leading up to the
failure - each log line is prefixed with actorID and all storage
access is logged.
* makefile: increase test timeout
* content: fixed index blob manager race
The race is where if we delete compaction log too early, it may lead to
previously deleted contents becoming temporarily live again to an
outside observer.
Added test case that reproduces the issue, verified that it fails
without the fix and passed with one.
* testing: improvements to TestIndexBlobManagerStress test
- better logging to be able to trace the root cause in case of a failure
- prevented concurrent compaction which is unsafe:
The sequence:
1. A creates contentA1 in INDEX-1
2. B creates contentB1 in INDEX-2
3. A deletes contentA1 in INDEX-3
4. B does compaction, but is not seeing INDEX-3 (due to EC or simply
because B started read before #3 completed), so it writes
INDEX-4==merge(INDEX-1,INDEX-2)
* INDEX-4 has contentA1 as active
5. A does compaction but it's not seeing INDEX-4 yet (due to EC
or because read started before #4), so it drops contentA1, writes
INDEX-5=merge(INDEX-1,INDEX-2,INDEX-3)
* INDEX-5 does not have contentA1
7. C sees INDEX-5 and INDEX-5 and merge(INDEX-4,INDEX-5)
contains contentA1 which is wrong, because A has been deleted
(and there's no record of it anywhere in the system)
* content: when building pack index ensure index bytes are different each time by adding 32 random bytes