Commit Graph

349 Commits

Author SHA1 Message Date
Julio López
e493177236 Remove legacy flags from snapshot create command (#441)
Remove `--hostname` and `--username` flags
2020-05-18 22:48:18 -07:00
Jarek Kowalski
ca28469706 cli: improved 'snapshot delete' usage (#436)
New usage:

```
kopia snapshot delete manifestID... [--delete]
kopia snapshot delete rootObjectID... [--delete]
```

Fixes #435

cli: added --unsafe-ignore-source as alias for `--delete`
This is a hidden flag for backwards compatibility. It will be removed.
2020-05-13 23:43:45 -07:00
Jarek Kowalski
be4b897579 Support for remote repository (#427)
Support for remote content repository where all contents and
manifests are fetched over HTTP(S) instead of locally
manipulating blob storage

* server: implement content and manifest access APIs
* apiclient: moved Kopia API client to separate package
* content: exposed content.ValidatePrefix()
* manifest: added JSON serialization attributes to EntryMetadata
* repo: changed repo.Open() to return Repository instead of *DirectRepository
* repo: added apiServerRepository
* cli: added 'kopia repository connect server'
  This sets up repository connection via the API server instead of
  directly-manipulated storage.
* server: add support for specifying a list of usernames/password via --htpasswd-file
* tests: added API server repository E2E test
* server: only return manifests (policies and snapshots) belonging to authenticated user
2020-05-02 21:41:49 -07:00
Jarek Kowalski
1377d057e4 Maintenance changes (#423)
* maintenance: encrypt maintenance schedule block

* maintenance: created snapshotmaintenance package that wraps maintenance and performs snapshot GC + regular maintenance in one shot, used in CLI and server

* PR feedback.
2020-05-02 20:40:16 -07:00
Jarek Kowalski
4f6dd94766 Maintenance: report and record record maintenance run timings (#422)
* mechanical rename of package snapshot/gc => snapshot/snapshotgc

* maintenance: record maintenance run times and statuses

Also stopped dropping deleted contents during quick maintenance, since
doing this safely requires coordinating with snapshot GC which is
part of full maintenance.

* cli: 'maintenance info' outputs maintenance run history

* maintenance: only drop index entries when it's safe to do so

This is based on the timestamp of previous successful GC that's old
enough to resolve all race conditions between snapshot creation and GC.

* maintenance: added internal flush to RewriteContents() to better measure its time
2020-04-20 08:30:03 -07:00
Jarek Kowalski
a462798b28 content: added blob-level metadata cache (#421)
Unlike regular cache, which caches segments of blobs on a per-content
basis, metadata cache will fetch and store the entire metadata blob (q)
when any of the contents in it is accessed.

Given that there are relatively few metadata blobs compared to data (p)
blobs, this will reduce the traffic to the underlying store and improve
performance of Snapshot GC which only relies on metadata contents.
2020-04-20 02:18:43 -07:00
Jarek Kowalski
4b4628a21e Repository maintenance support (#411)
Maintenance: support for automatic GC

Moved maintenance algorithms from 'cli' to 'repo/maintenance' package

Added support for CLI commands:

kopia gc - performs quick maintenance
kopia gc --full- perform full maintenance

Full maintenance performs snapshot gc, but it's not safe to do this automatically possibly in parallel to snapshots being taken. This will be addressed ~0.7 timeframe.
2020-04-14 00:11:41 -07:00
Jarek Kowalski
b671bda152 cli: add missing options for configuring sftp known_hosts
Possibly the cause of #414
2020-04-12 14:59:24 -07:00
Jarek Kowalski
1f1682b2cc Snapshot checkpointing (#410)
* snapshot: support for periodic checkpointing of snapshots in progress

For each snapshot that takes longer than 45 minutes, we trigger
internal cancellation, save the manifest and restart the snapshot
at which point all files will be cached.

This helps ensure the property that no file or directory objects
in the repository remain unreachable from a snapshot root for more than
one hour, which is important from GC perspective.

* nit: unified spelling 'cancelled' => 'canceled'
2020-04-07 17:54:21 -07:00
Jarek Kowalski
70d4c8764a cli: improvements to content selection for list/rewrite/stats/verify (#409)
They now uniformly support 3 flags:

--prefix=P       selects contents with the specified prefix
--prefixed       selects contents with ANY prefix
--non-prefixed   selects non-prefixed contents

Also changed content manager iteration API to support ranges.

cli: add --prefix to 'blob gc' and 'blob stats'
2020-04-06 18:43:41 -07:00
Jarek Kowalski
057c2789d8 Kopia UI: support for multiple repositories + portability (#398)
* server: when serving HTML UI, prefix the title with string from KOPIA_UI_TITLE_PREFIX envar

* kopia-ui: support for multiple repositories + portability

This is a major rewrite of the app/ codebase which changes
how configuration for repositories is maintained and how it flows
through the component hierarchy.

Portable mode is enabled by creating 'repositories' subdirectory before
launching the app.

on macOS:
  <parent>/KopiaUI.app
  <parent>/repositories/

On Windows, option #1 - nested directory
  <parent>\KopiaUI.exe
  <parent>\repositories\

On Windows, option #2 - parallel directory
  <parent>\some-dir\KopiaUI.exe
  <parent>\repositories\

In portable mode, repositories will have 'cache' and 'logs' nested
in it.
2020-04-04 17:18:37 -07:00
Andreas Schneider
4bed45114d implemented b2 backend 2020-03-28 09:07:44 -07:00
Julio López
3924fb247a Minor cleanup for snapshot time override (#392)
* Simplify cli.parseTimestamp
* nit: move duration var closer to where it is used
2020-03-27 00:19:16 -07:00
Jarek Kowalski
6cb9b8fa4f repo: refactored public API (#318)
* This is 99% mechanical:

Extracted repo.Repository interface that only exposes high-level object and manifest management methods, but not blob nor content management.

Renamed old *repo.Repository to *repo.DirectRepository

Reviewed codebase to only depend on repo.Repository as much as possible, but added way for low-level CLI commands to use DirectRepository.

* PR fixes
2020-03-26 08:04:01 -07:00
Jarek Kowalski
10bb492926 repo: deprecated NONE algorithm, will not be available for new repositories (#395)
* repo: deprecated NONE algorithm, will not be available for new repositories

Co-authored-by: Julio López <julio+gh@kasten.io>
2020-03-24 23:19:20 -07:00
Jarek Kowalski
60977812f0 Support for gather writes (#373)
, where blob.Storage.PutBlob gets a list of slices and writes them sequentially 
* performance: added gather.Bytes and gather.WriteBuffer

They are similar to bytes.Buffer but instead of managing a single
byte slice, they maintain a list of slices that and when they run out of
space they allocate new fixed-size slice from a free list.

This helps keep memory allocations completely under control regardless
of the size of data written.

* switch from byte slices and bytes.Buffer to gather.Bytes.

This is mostly mechanical, the only cases where it's not involve blob
storage providers, where we leverage the fact that we don't need to
ever concatenate the slices into one and instead we can do gather
writes.

* PR feedback
2020-03-24 15:05:52 -07:00
Nick
393d273e5a Policy set: fix nil pointer dereference #385 (#387)
* Policy set: fix nil pointer dereference #385
2020-03-23 18:32:36 -07:00
Jarek Kowalski
9b68a631e6 Highlight snapshot errors in the UI and CLI (#376)
* upload: exposed numFailed and failedEntries on directory summary

* cli: better present snapshot errors

* htmlui: display snapshot errors
2020-03-22 14:18:47 -07:00
Seb Patane
6789f8e64c cli: allow override of snapshot start time and end time 2020-03-21 09:27:32 -07:00
Onkar Bhat
a1f7a068de Add CLI option for disabling tls verification while connecting to s3 (#370)
* Add CLI option for disabling tls verification while connecting to s3
2020-03-19 17:21:47 -07:00
Jarek Kowalski
d930f06a27 cli: added flags to control progress output
--no-progress - disables progress completely
--progress-update-interval=T - controls how frequently progress is updated

Fixes #344
2020-03-17 05:11:42 -07:00
Jarek Kowalski
331ff076bb cli: added number of files that are in the process of being hashed 2020-03-14 15:56:09 -07:00
Jarek Kowalski
8d452a8285 performance: improvements to object manager (#336)
- added pooled splitters and ability to reset them without having to recreate
- added support for caller-provided compressor output to be able to pool it
- added pooling of compressor instances, since those are costly
2020-03-13 08:56:18 -07:00
Jarek Kowalski
526445a9c8 CLI: pre-allocated buffers for crypto benchmark (#343)
non-optimized (0.5.0)
  0. BLAKE2B-256-128      AES256-GCM-HMAC-SHA256 644.9 MiB / second

before this change:
  0. BLAKE2B-256-128      AES256-GCM-HMAC-SHA256 655.9 MiB / second

after (this change):
  0. BLAKE2B-256-128      AES256-GCM-HMAC-SHA256 781.5 MiB / second
2020-03-12 12:11:20 -07:00
Jarek Kowalski
e80f5536c3 performance: plumbed through output buffer to encryption and hashing,… (#333)
* performance: plumbed through output buffer to encryption and hashing, so that the caller can pre-allocate/reuse it

* testing: fixed how we do comparison of byte slices to account for possible nils, which can be returned from encryption
2020-03-12 08:27:44 -07:00
Julio López
89c0c6bac4 Refactor CLI stats (#341)
* Helper package internal/stats
* Use internal/stats for blob gc stats
* Use internal/stats for content list stats
* Refactor gc stats
  - Leverages internal/stats package
  - Return GC stats
  - nit: error message formatting
  - Refactor block in gc.Run.
     Simplifies and reduces a level of indentation
2020-03-11 22:16:07 -07:00
Jarek Kowalski
6ce8410a29 Added OpenCensus (#339)
* repo: added some initial metrics using OpenCensus

* cli: added flags to expose Prometheus metrics on a local endpoint

`--metrics-listen-addr=localhost:X` exposes prometheus metrics on
   http://localhost:X/metrics

Also, kopia server will automatically expose /metrics endpoint on the
same port it runs as, without authentication.
2020-03-11 22:07:31 -07:00
Julio López
edc87fcce8 Refactor content stats (#340)
* Remove unused fields from content.Stats
* Refactor content.Stats
2020-03-11 21:47:05 -07:00
Jarek Kowalski
514df69afa performance: added wrapper around io.Copy()
this pools copy buffers so they can be reused instead of throwing away
after each io.Copy()
2020-03-10 21:52:30 -07:00
Jarek Kowalski
cd35e3bab5 cli: 'snapshot migrate' improvements to help with data migration
- cleaned up migration progress output
- fixed migration idempotency
- added migration of policies
- renamed --parallelism to --parallel
- improved e2e test
- do not prompt for password to source repository if persisted
2020-03-07 21:47:32 -08:00
Jarek Kowalski
d526843124 cli: fixed metadata cache size on connect/create 2020-03-07 21:47:32 -08:00
Jarek Kowalski
c5cf95fdf6 cli: improved 'content verify'
Now you can quickly verify that all contents are correctly backed
by existing blob without downloading much.

You can still use '--full' to cause full download and decryption.
2020-03-05 17:33:56 -08:00
Jarek Kowalski
889f2ead59 cli: improved 'blob gc'
- do not remove blobs younger than 4h when performing blob GC,
  because they may have just been written
- parallelize deletes
- clean up console output
2020-03-05 17:33:56 -08:00
Jarek Kowalski
6f68b726a7 cli: improved 'content rewrite'
- removed confusing '--prefixed' option
- print timestamp of contents as they are rewritten
2020-03-05 17:33:56 -08:00
Jarek Kowalski
a4fad4ca5a cli: added 'blob stats' command 2020-03-05 17:33:56 -08:00
Jarek Kowalski
fb181257bf cli: implemented update check, fixes #119 2020-03-04 22:06:05 -08:00
Jarek Kowalski
d95e6a3d09 sftp: Fixed issues in SFTP provider, Fixes #216
- did not work on windows due to use of filepath which uses backslash
  instead of slash
- added support for embedding SFTP key
- fixed UI controls
- misc fixes for KopiaUI
- added progress reporting
2020-03-01 18:56:06 -08:00
Jarek Kowalski
38862a7bf9 kopia-ui: fixed redirection to /repo not working (404) 2020-02-29 21:55:06 -08:00
Jarek Kowalski
ddd267accc crypto: deprecated crypto algorithms and replaced with better alternatives
New ciphers are using authenticated encryption with associated data
(AEAD) and per-content key derived using HMAC-SHA256:

* AES256-GCM-HMAC-SHA256
* CHACHA20-POLY1305-HMAC-SHA256

They support content IDs of arbitrary length and are quite fast:

On my 2019 MBP:

- BLAKE2B-256 + AES256-GCM-HMAC-SHA256 - 648.7 MiB / second
- BLAKE2B-256 + CHACHA20-POLY1305-HMAC-SHA256 - 597.1 MiB / second
- HMAC-SHA256 + AES256-GCM-HMAC-SHA256 351 MiB / second
- HMAC-SHA256 + CHACHA20-POLY1305-HMAC-SHA256 316.2 MiB / second

Previous ciphers had several subtle issues:

* SALSA20 encryption, used weak nonce (64 bit prefix of content ID),
  which means that for any two contents, whose IDs that have the same
  64-bit prefix, their plaintext can be decoded from the ciphertext
  alone.

* AES-{128,192,256}-CTR were not authenticated, so we were
  required to hash plaintext after decryption to validate. This is not
  recommended due to possibility of subtle timing attacks if an attacker
  controls the ciphertext.

* SALSA20-HMAC was only validating checksum and not that the ciphertext
  was for the correct content ID.

New repositories cannot be created using deprecated ciphers, but they
will still be supported for existing repositories, until at least 0.6.0.

The users are encouraged to migrate to one of new ciphers when 0.5.0 is
out.
2020-02-29 20:50:50 -08:00
Jarek Kowalski
d181403284 crypto: refactored encryption, hashing and splitter into separate packages (#274)
Added some tests, deleted XSALSA20 which never worked E2E
2020-02-27 12:36:49 -08:00
Jarek Kowalski
765bff8e0b cli: restored snapshot create --hostname and --username flags 2020-02-25 20:40:23 -08:00
Jarek Kowalski
e3854f7773 BREAKING: changed how hostname/username are handled
The hostname/username are now persisted when connecting to repository
in a local config file.

This prevents weird behavior changes when hostname is suddenly changed,
such as when moving between networks.

repo.Repository will now expose Hostname/Username properties which
are always guarnateed to be set, and are used throughout.

Removed --hostname/--username overrides when taking snapshot et.al.
2020-02-25 20:40:23 -08:00
Jarek Kowalski
c8fcae93aa logging: refactored logging
This is mostly mechanical and changes how loggers are instantiated.

Logger is now associated with a context, passed around all methods,
(most methods had ctx, but had to add it in a few missing places).

By default Kopia does not produce any logs, but it can be overridden,
either locally for a nested context, by calling

ctx = logging.WithLogger(ctx, newLoggerFunc)

To override logs globally, call logging.SetDefaultLogger(newLoggerFunc)

This refactoring allowed removing dependency from Kopia repo
and go-logging library (the CLI still uses it, though).

It is now also possible to have all test methods emit logs using
t.Logf() so that they show up in failure reports, which should make
debugging of test failures suck less.
2020-02-25 17:24:44 -08:00
Prasad Ghangal
c682fffdf2 Support for Azure blob storage (#271)
Signed-off-by: Prasad Ghangal <prasad.ghangal@gmail.com>
2020-02-25 16:32:26 -08:00
Jarek Kowalski
897483299f Kopia UI & CLI: support for progress indicator (#268)
Percentage based on last-known snapshot size

* server: exposed last completed snapshot size in the API
* cli: added support for progress indicator (percentage based on last-known snapshot size)
* htmlui: added progress indicator in the UI (percentage based on last-known snapshot size)
2020-02-24 17:55:02 -08:00
Jarek Kowalski
f8006f8ce0 cli: removed flags for configuring global policy on repository creation 2020-02-18 12:21:11 -08:00
Jarek Kowalski
a21da7b960 cli: fixed double-close of repository during 'server start' 2020-02-16 22:43:36 -08:00
Jarek Kowalski
0f79279f5e server: added support for new verbs in the API
/api/v1/repo/create
/api/v1/repo/connect
/api/v1/repo/disconnect

Refactored server code and fixed a number of outstanding robustness
issues. Tweaked the API responses a bit to make more sense when consumed
by the UI.
2020-02-13 17:23:50 -08:00
Prasad Ghangal
1c3858a906 Add AWS Session Token support to Kopia
Signed-off-by: Prasad Ghangal <prasad.ghangal@gmail.com>
2020-02-12 08:19:24 -08:00
Jarek Kowalski
a8503a007c server: allow starting server on randomly-allocated port by specifing port 0 (#224)
allow starting server on randomly-allocated port by specifing port 0
2020-02-11 17:08:50 -08:00