* sftp: support for external SSH command and host verfication improvements
- removed custom parsing of hostnames and verification and replaced with
standard 'knownhosts' implementation.
- added option to launch external SSH command which supports
aliases, agent, etc.
NOTE, we're still not supporting any cases where password needs to be
entered on the command line, since that would be incompatible with
the UI which uses client-server model.
Fixes#500Fixes#414
* site: updated SFTP repository connection instructions
Fixes#590
* cli: ensure advanced commands are not accidentally used
This prints an error when a dangerous command is used without
first setting KOPIA_ADVANCED_COMMANDS=enabled environment variable.
Co-authored-by: Julio López <julio+gh@kasten.io>
* object: refactored writer to detect split points before writing
This introduces new primitive that will be moved into splitters
themselves in subsequent commits. I'm doing this in small steps to
ensure we don't regress at any time.
* splitter: refactored TestSplitters test
This is use slow (byte-by-byte) and fast (nextSplitPoint) methods of
determining split points.
Note nextSplitPoint is not implemented by splitters yet, but this
verifies that the test is expecting the right thing.
* object: splitter refactoring - replaced ShouldSplit() with NextSplitPoint() everywhere, still not optimized
* splitter: added additional dimension to splitter_test
We split either in large chunks or one byte at a time to catch
the corner cases in the splitter implementation.
* splitter: optimized splitters using NextSplitPoint primitive
This improves splitter performance by about 40% (buzhash) and makes
it virtually free for FIXED splitter.
* cli: added --tls-print-server-cert flag
This prints complete server certificate that is base64 and PEM-encoded.
It is needed for Electron to securely connect to the server outside of
the browser, since there's no way to trust certificate by fingerprint.
* server: added repo/exists API
* server: added ClientOptions to create and connect API
* server: exposed current-user API
* server: API to change description of a repository
* htmlui: refactored connect/create flow
This cleaned up the code a lot and made UX more obvious.
* kopia-ui: simplified repository management UX
Removed repository configuration window which was confusing due to
the notion of 'server'.
Now KopiaUI will automatically launch 'kopia server --ui' for each
config found in the kopia config directory and shut it down every
time repository is disconnected.
See https://youtu.be/P4Ll_LR4UVM for a quick demo.
Fixes#583
* repo: refactored client-specific options (hostname,username,description,readonly) into new struct that is JSON-compatible with current config
* cli: added 'repository set-client' to configure parameters of connected repository
* cli: cleaned up 'repository status' output
* cli: simplified mount command
See https://youtu.be/1Nt_HIl-NWQ
It will always use WebDAV on Windows and FUSE on Unix. Removed
confusing options.
New usage:
$ kopia mount [--browse]
Mounts all snapshots in a temporary filesystem directory
(both Unix and Windows).
$ kopia mount <object> [--browse]
Mounts given object in a temporary filesystem directory
(both Unix and Windows).
$ kopia mount <object> z: [--browse]
Mounts given object as a given drive letter in Windows (using
temporary WebDAV mount).
$ kopia mount <object> * [--browse]
Mounts given object as a random drive letter in Windows.
$ kopia mount <object> /mount/path [--browse]
Mounts given object in given path in Unix.
<object> can be the ID of a directory 'k<hash>' or 'all'
Optional --browse automatically opens OS-native file browser.
* htmlui: added UI for mounting directories
See https://youtu.be/T-9SshVa1d8 for a quick demo.
Also replaced some UI text with icons.
* lint: windows-specific fix
Fixes#564
cli: added 'kopia policy set --ignore-cache-dirs' option to control
whether to ignore caches (global default=true)
ui: added checkbox to control 'Ignore Cache Dirs' in policy editor
ignorefs: moved ignoring cache directories to ignorefs layer
Co-authored-by: Julio López <julio+gh@kasten.io>
Globally replaced all use of time with internal 'clock' package
which provides indirection to time.Now()
Added support for faking clock in Kopia via KOPIA_FAKE_CLOCK_ENDPOINT
logfile: squelch annoying log message
testenv: added faketimeserver which serves time over HTTP
testing: added endurance test which tests kopia over long time scale
This creates kopia repository and simulates usage of Kopia over multiple
months (using accelerated fake time) to trigger effects that are only
visible after long time passage (maintenance, compactions, expirations).
The test is not used part of any test suite yet but will run in
post-submit mode only, preferably 24/7.
testing: refactored internal/clock to only support injection when
'testing' build tag is present
* cli: added 'index inspect' which can dump contents of index blob or local file
* repo: added read-only option when connecting to a repo which prevents any mutations
Co-authored-by: Julio Lopez <julio+gh@k....io>
This will launch 'rclone webdav server' passing random TLS
certificate and username/password and serve predefined rclone
remote path.
This is very experimental, use with caution.
Fixes#313.
Additional / required changes:
* blob: (experimental) support for rclone provider
* server: refactored TLS utilities to separate package
* webdav: add support for specifying trusted TLS certificate fingerprint
* kopia-ui: added rclone support
* cli: small tweaks to kopia server mode
* print SHA256 certficate thumbprint for auto-generated certs.
* client will accept both upper- and lowercase thumbprint values
* site: updated documentation for v0.6.0 release
Co-authored-by: Julio López <julio+gh@kasten.io>
* blob: added DisplayName() method to blob.Storage
* cli: added 'kopia repo sync-to <provider>' which replicates BLOBs
Usage demo: https://asciinema.org/a/352299Fixes#509
* implemented suggestion by Ciantic to fail sync if the destination repository is not compatible with the source
* cli: added 'kopia repo sync --must-exist'
This ensures that target repository is not empty, otherwise syncing to
an accidentally unmounted filesystem directory might copy everything
again.
* cli: fixed 'kopia policy rm' deleting global when passed policy ID
* policy: additional unit test coverage for policy manager
* fixed path parsing logic to avoid the use filepath package which is platform-dependent, added more tests
* restore: support for zip, tar and tar.gz restore outputs
Moved restore functionality to its own package.
* Fix enum values in the 'mode' flag
Co-authored-by: Julio López <julio+gh@kasten.io>
* content: ensure that cleanup blobs have unique contents to prevent situation where they keep getting rewritten and thus never deleted
* cli: added '--decrypt' option to 'kopia blob show'
- run maintenance even if the command is about to return an error
(otherwise if folks have persistent error causing snapshots to fail
they will never run maintenance)
- disable progress output after snapshotting so that
'kopia snapshot --all' output is clean
instead moved to run as part of maintenance ('kopia maintenance run')
added 'kopia maintenance run --force' flag which runs maintenance even
if not owned
* content: added support for cache of own writes
Thi keeps track of which blobs (n and m) have been written by the
local repository client, so that even if the storage listing
is eventually consistent (as in S3), we get somewhat sane behavior.
Note that this is still assumming read-after-create semantics, which
S3 also guarantees, otherwise it's very hard to do anything useful.
* compaction: support for compaction logs
Instead of compaction immediately deleting source index blobs, we now
write log entries (with `m` prefix) which are merged on reads
and applied only if the blob list includes all inputs and outputs, in
which case the inputs are discarded since they are known to have been
superseded by the outputs.
This addresses eventual consistency issues in stores such as S3,
which don't guarantee list-after-put or list-after-delete. With such
stores the repository is ultimately eventually consistent and there's
not much that can be done about it, unless we use second strongly
consistent storage (such as GCS) for the index only.
* content: updated list cache to cache both `n` and `m`
* repo: fixed cache clear on windows
Clearing cache requires closing repository first, as Windows is holding
the files locked.
This requires ability to close the repository twice.
* content: refactored index blob management into indexBlobManager
* testing: fixed blobtesting.Map storage to allow overwrites
* blob: added debug output String() to blob.Metadata
* testing: added indexBlobManager stress test
This works by using N parallel "actors", each repeatedly performing
operations on indexBlobManagers all sharing single eventually consistent
storage.
Each actor runs in a loop and randomly selects between:
- *reading* all contents in indexes and verifying that it includes
all contents written by the actor so far and that contents are
correctly marked as deleted
- *creating* new contents
- *deleting* one of previously-created contents (by the same actor)
- *compacting* all index files into one
The test runs on accelerated time (every read of time moves it by 0.1
seconds) and simulates several hours of running.
In case of a failure, the log should provide enough debugging
information to trace the exact sequence of events leading up to the
failure - each log line is prefixed with actorID and all storage
access is logged.
* makefile: increase test timeout
* content: fixed index blob manager race
The race is where if we delete compaction log too early, it may lead to
previously deleted contents becoming temporarily live again to an
outside observer.
Added test case that reproduces the issue, verified that it fails
without the fix and passed with one.
* testing: improvements to TestIndexBlobManagerStress test
- better logging to be able to trace the root cause in case of a failure
- prevented concurrent compaction which is unsafe:
The sequence:
1. A creates contentA1 in INDEX-1
2. B creates contentB1 in INDEX-2
3. A deletes contentA1 in INDEX-3
4. B does compaction, but is not seeing INDEX-3 (due to EC or simply
because B started read before #3 completed), so it writes
INDEX-4==merge(INDEX-1,INDEX-2)
* INDEX-4 has contentA1 as active
5. A does compaction but it's not seeing INDEX-4 yet (due to EC
or because read started before #4), so it drops contentA1, writes
INDEX-5=merge(INDEX-1,INDEX-2,INDEX-3)
* INDEX-5 does not have contentA1
7. C sees INDEX-5 and INDEX-5 and merge(INDEX-4,INDEX-5)
contains contentA1 which is wrong, because A has been deleted
(and there's no record of it anywhere in the system)
* content: when building pack index ensure index bytes are different each time by adding 32 random bytes
New usage:
```
kopia snapshot delete manifestID... [--delete]
kopia snapshot delete rootObjectID... [--delete]
```
Fixes#435
cli: added --unsafe-ignore-source as alias for `--delete`
This is a hidden flag for backwards compatibility. It will be removed.
Support for remote content repository where all contents and
manifests are fetched over HTTP(S) instead of locally
manipulating blob storage
* server: implement content and manifest access APIs
* apiclient: moved Kopia API client to separate package
* content: exposed content.ValidatePrefix()
* manifest: added JSON serialization attributes to EntryMetadata
* repo: changed repo.Open() to return Repository instead of *DirectRepository
* repo: added apiServerRepository
* cli: added 'kopia repository connect server'
This sets up repository connection via the API server instead of
directly-manipulated storage.
* server: add support for specifying a list of usernames/password via --htpasswd-file
* tests: added API server repository E2E test
* server: only return manifests (policies and snapshots) belonging to authenticated user
* maintenance: encrypt maintenance schedule block
* maintenance: created snapshotmaintenance package that wraps maintenance and performs snapshot GC + regular maintenance in one shot, used in CLI and server
* PR feedback.
* mechanical rename of package snapshot/gc => snapshot/snapshotgc
* maintenance: record maintenance run times and statuses
Also stopped dropping deleted contents during quick maintenance, since
doing this safely requires coordinating with snapshot GC which is
part of full maintenance.
* cli: 'maintenance info' outputs maintenance run history
* maintenance: only drop index entries when it's safe to do so
This is based on the timestamp of previous successful GC that's old
enough to resolve all race conditions between snapshot creation and GC.
* maintenance: added internal flush to RewriteContents() to better measure its time
Unlike regular cache, which caches segments of blobs on a per-content
basis, metadata cache will fetch and store the entire metadata blob (q)
when any of the contents in it is accessed.
Given that there are relatively few metadata blobs compared to data (p)
blobs, this will reduce the traffic to the underlying store and improve
performance of Snapshot GC which only relies on metadata contents.
Maintenance: support for automatic GC
Moved maintenance algorithms from 'cli' to 'repo/maintenance' package
Added support for CLI commands:
kopia gc - performs quick maintenance
kopia gc --full- perform full maintenance
Full maintenance performs snapshot gc, but it's not safe to do this automatically possibly in parallel to snapshots being taken. This will be addressed ~0.7 timeframe.
* snapshot: support for periodic checkpointing of snapshots in progress
For each snapshot that takes longer than 45 minutes, we trigger
internal cancellation, save the manifest and restart the snapshot
at which point all files will be cached.
This helps ensure the property that no file or directory objects
in the repository remain unreachable from a snapshot root for more than
one hour, which is important from GC perspective.
* nit: unified spelling 'cancelled' => 'canceled'
They now uniformly support 3 flags:
--prefix=P selects contents with the specified prefix
--prefixed selects contents with ANY prefix
--non-prefixed selects non-prefixed contents
Also changed content manager iteration API to support ranges.
cli: add --prefix to 'blob gc' and 'blob stats'
* server: when serving HTML UI, prefix the title with string from KOPIA_UI_TITLE_PREFIX envar
* kopia-ui: support for multiple repositories + portability
This is a major rewrite of the app/ codebase which changes
how configuration for repositories is maintained and how it flows
through the component hierarchy.
Portable mode is enabled by creating 'repositories' subdirectory before
launching the app.
on macOS:
<parent>/KopiaUI.app
<parent>/repositories/
On Windows, option #1 - nested directory
<parent>\KopiaUI.exe
<parent>\repositories\
On Windows, option #2 - parallel directory
<parent>\some-dir\KopiaUI.exe
<parent>\repositories\
In portable mode, repositories will have 'cache' and 'logs' nested
in it.
* This is 99% mechanical:
Extracted repo.Repository interface that only exposes high-level object and manifest management methods, but not blob nor content management.
Renamed old *repo.Repository to *repo.DirectRepository
Reviewed codebase to only depend on repo.Repository as much as possible, but added way for low-level CLI commands to use DirectRepository.
* PR fixes
, where blob.Storage.PutBlob gets a list of slices and writes them sequentially
* performance: added gather.Bytes and gather.WriteBuffer
They are similar to bytes.Buffer but instead of managing a single
byte slice, they maintain a list of slices that and when they run out of
space they allocate new fixed-size slice from a free list.
This helps keep memory allocations completely under control regardless
of the size of data written.
* switch from byte slices and bytes.Buffer to gather.Bytes.
This is mostly mechanical, the only cases where it's not involve blob
storage providers, where we leverage the fact that we don't need to
ever concatenate the slices into one and instead we can do gather
writes.
* PR feedback