Commit Graph

599 Commits

Author SHA1 Message Date
Jarek Kowalski
70e24106ee refactor(general): unified logging.Logger with *zap.SugaredLogger (#2090)
- removed a bunch of hacks and should improve the logging
performance by avoiding interfaces and data translation. This will
allow using of de-sugared loggers in performance-critical
logging situations.

- this will also allow using features of ZAP more directly without
having to reimplement them.

- moved logging.Printf() to testlogging

- refactored `uitask` to store logs in a structural format and
present them as JSON only in the UI

- renamed printf_logger.go to printf.go so that fewer columns are used
in the logs
2022-06-26 05:11:52 +00:00
Jarek Kowalski
68b8afd43f feat(snapshots): improved performance when uploading huge files (#2064)
* feat(snapshots): improved performance when uploading huge files

This is controlled by an upload policy which specifies the size
threshold above which indvidual files are uploaded in parts
and concatenated.

This allows multiple threads to run splitting, hashing, compression
and encryption in parallel, which was previously only possible across
multiple files, but not when a single file was being uploaded.

The default is 2GiB for now, so this feature only kicks in for very
larger files. In the future we may lower this.

Benchmark involved uploading a single 42.1 GB file which was a VM disk
snapshot of fresh Ubuntu installation (fresh EXT4 partition with lots
of zero bytes) to a brand-new filesystem repository on local SSD of
M1 Pro Macbook Pro 2021.

* before: 59-63s (~700 MB/s)
* after: 15-17s  (~2.6 GB/s)

* additional test to ensure files are really e2e readable
2022-06-24 07:38:07 +00:00
Shlok Chaudhari
06c8de08de test(cli): add separate test case for days in --retention-period flag (#2057) 2022-06-16 11:10:23 -07:00
Shlok Chaudhari
493adba9cf fix(cli): update kingpin version to fix --retention-period and other time.Duration type flags (#2054)
* fix(cli): Update Kingpin dependency to fix time.Duration type flags like --retention-period
* test(cli): add a test for duration parser to parse days, weeks
2022-06-15 11:27:43 -07:00
ashmrtn
61e651d30c feat(snapshots): Allow users to dynamically create entries in a directory during an upload (#1996)
* Allow dynamic directory entries with virtualfs

* Tests for new virtualfs implementation

* Add escape hatch for estimator during upload

Some virtualfs.StreamingDirectory-s may not be able to (efficiently)
support iterating through entries multiple times. Make a way for the
estimator to ask if they support multiple iterations and skip the
directory if they do not.

* Exapand Directory interface

Expand the Directory interface instead of making a new interface as it's
error-prone to ensure all wrapper types properly handle types that use
the new interface.

* Post-rebase fixes

* Make StreamingDirectory single iteration only

Simplify code and test slightly by not allowing users to declare a
StreamingDirectory that can be iterated through multiple times.

* Add better test for estimator ignoring stream dir

Previous test in uploader had a race condition, meaning it may not catch
all cases.

* Ignore atomic access in checklocks

Comparisons known to be done after all additions to the variables in
question.

* Implement reviewer feedback

* Remove unused function parameter
2022-06-14 19:08:49 -07:00
Ali Dowair
eddde91f2d chore(snapshots): unify sparse and normal FS output paths (#1981)
* Unify sparse and normal IO output

This commit refactors the code paths that excercise normal and sparse
writing of restored content. The goal is to expose sparsefile.Copy()
and iocopy.Copy() to be interchangeable, thereby allowing us to wrap
or transform their behavior more easily in the future.

* Introduce getStreamCopier()

* Pull ioCopy() into getStreamCopier()

* Fix small nit in E2E test

We should be getting the block size of the destination file, not
the source file.

* Call stat.GetBlockSize() once per FilesystemOutput

A tiny refactor to pull this call out of the generated stream copier,
as the block size should not change from one file to the next within
a restore entry.

NOTE: as a side effect, if block size could not be found (an error
is returned), we will return the default stream copier instead of
letting the sparse copier fail. A warning will be logged, but this
error will not cause the restore to fail; it will proceed silently.
2022-06-14 18:09:45 +00:00
basldfalksjdf
a78dacc0d1 feat(snapshots): add option to ignore empty snapshots being saved (#2036)
* Add policy to ignore empty snapshots being saved

* Update command_snapshot_create.go

* Update source_manager.go

* Update command_snapshot_create.go

* Update source_manager.go

* fixes

* Update source_manager.go

* Update command_snapshot_create.go

* fix

* fix

Co-authored-by: Mehak  Satija <mehaksatija@Mehaks-MacBook-Pro.local>
Co-authored-by: Mehak  Satija <mehaksatija@Mehaks-MBP.hitronhub.home>
2022-06-13 03:41:08 +00:00
ashmrtn
ef8828a072 refactor(snapshots): Remove remaining internal uses of Readdir (#1986)
* Remove remaining internal uses of Readdir

* Remove old helpers and interface functions.

* Update tests for updated fs.Directory interface

* Fix index out of range error in snapshot walker

Record one error if an error occurred and it's not limiting errors

* Use helper functions more; exit loops early

Follow up on reviewer comments and reduce code duplication, use more
targetted functions like Directory.Child, and exit directory iteration
early if possible.

* Remove fs.Entries type and unused functions

Leave some functions dealing with sorting and finding entries in fs
package. This retains tests for those functions while still allowing
mockfs to access them.

* Simplify function return
2022-06-04 06:36:25 -07:00
Julio Lopez
511f4aa65d chore(cli): minor metrics-related cleanups (#1995)
* stop ticker to release resources
* nit: fix typo
* nit: add new line at EOF
2022-05-31 14:04:01 -07:00
Julio Lopez
99fb50118f fix(cli): add retention to JSON output (#1992)
refactor(cli): snapshot list JSON functionality.
Defines SnapshotManifest struct for the snapshot list JSON output.

test(cli): `snapshot list --json`
2022-05-31 13:43:42 -07:00
Jarek Kowalski
17c74e6386 feat(cli): added open telemetry tracing support (#1988)
New flag `--enable-jaeger-collector` and the corresponding
`KOPIA_ENABLE_JAEGER_COLLECTOR` environment variable enables Jaeger
exporter, which by default sends OTEL traces to Jaeger collector on
http://localhost:14268/api/traces

To change this, use environment variables:

* `OTEL_EXPORTER_JAEGER_ENDPOINT`
* `OTEL_EXPORTER_JAEGER_USER`
* `OTEL_EXPORTER_JAEGER_PASSWORD`

When tracing is disabled, the impact on performance is negligible.

To see this in action:

1. Download latest Jaeger all-in-one from https://www.jaegertracing.io/download/
2. Run `jaeger-all-in-one` binary without any parameters.
3. Run `kopia --enable-jaeger-collector snapshot create ...`
4. Go to http://localhost:16686/search and search for traces
2022-05-28 10:39:00 -07:00
Jarek Kowalski
e8c1cfe142 feat(cli): added flags for pushing kopia metrics (#1983)
When enabled, metrics are pushed to the provided Prometheus Push
Gateway at the start and end of each command and periodically every
few seconds.

```
--metrics-push-addr=http://address:port
--metrics-push-interval=5s
--metrics-push-job=kopia
--metrics-push-grouping=a:b --metrics-push-grouping=c:d
--metrics-push-username=user
--metrics-push-password=pass
```
2022-05-28 07:44:59 -07:00
Julio Lopez
e9ca80dd27 refactor(cli): deprecate snapshot gc command (#1973)
* refactor(cli): deprecate `snapshot gc` command
* replace `snapshot gc` in tests with `maintenance run`
2022-05-26 05:51:36 +00:00
Jarek Kowalski
9bf9cac7fb refactor(repository): ensure we always parse content.ID and object.ID (#1960)
* refactor(repository): ensure we always parse content.ID and object.ID

This changes the types to be incompatible with string to prevent direct
conversion to and from string.

This has the additional benefit of reducing number of memory allocations
and bytes for all IDs.

content.ID went from 2 allocations to 1:
   typical case 32 characters + 16 bytes per-string overhead
   worst-case 65 characters + 16 bytes per-string overhead
   now: 34 bytes

object.ID went from 2 allocations to 1:
   typical case 32 characters + 16 bytes per-string overhead
   worst-case 65 characters + 16 bytes per-string overhead
   now: 36 bytes

* move index.{ID,IDRange} methods to separate files

* replaced index.IDFromHash with content.IDFromHash externally

* minor tweaks and additional tests

* Update repo/content/index/id_test.go

Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>

* Update repo/content/index/id_test.go

Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>

* pr feedback

* post-merge fixes

* pr feedback

* pr feedback

* fixed subtle regression in sortedContents()

This was actually not producing invalid results because of how base36
works, just not sorting as efficiently as it could.

Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
2022-05-25 14:15:56 +00:00
Jarek Kowalski
f49bcdd883 feat(cli): implementation for 'kopia snapshot fix' (#1930)
* feat(cli): implementation for 'kopia snapshot fix'

This allows modifications and fixes to the snapshots after they have
been taken.

Supported are:

* `kopia snapshot fix remove-invalid-files [--verify-files-percent=X]`

Removes all directory entries where the underlying files cannot be
read based on index analysis (this does not read the files, only index
structures so is reasonably quick).

`--verify-files-percent=100` can be used to trigger full read for
all files.

* `kopia snapshot fix remove-files --object-id=<object-id>`

Removes the object with a given ID from the entire snapshot tree.
Useful when you accidentally snapshot a sensitive file.

* `kopia snapshot fix remove-files --filename=<wildcard>`

Removes the files with a given name from the entire snapshot tree.
Useful when you accidentally snapshot a sensitive file.

By default all snapshots are analyzed and rewritten. To limit the scope
use:

--source=user@host:/path
--manifest-id=manifestID

By default the rewrite operation writes new directory entries but
does not replace the manifests. To do that pass `--commit`.

Related #1906
Fixes #799

reorganized CLI per PR suggestion

* additional logging for diff command

* added Clone() method to snapshot manifst and directory entry

* added a comprehensive test, moved DirRewriter  to separate file

* pr feedback

* more pr feedback

* improved logging output

* disable test in -race configuration since it's way to slow

* pr feedback
2022-05-25 01:17:55 +00:00
Julio Lopez
3d1de6f27a chore(general): minor cleanups (#1959)
- expand command flag description for clarification
- include blob id in blob get error in the cache
- nit: remove unused BOTO_PATH
- nit: fix comment
- cleanup: remove unnecessary function declaration in interface
- leverage 'testify' to simplify test
2022-05-23 15:16:25 -07:00
Jarek Kowalski
99eeb3c063 feat(cli): added CLI for controlling throttler (#1956)
Supported are:

```
$ kopia throttle set \
    --download-bytes-per-second=N | unlimited
    --upload-bytes-per-second=N | unlimited
    --read-requests-per-second=N | unlimited
    --write-requests-per-second=N | unlimited
    --list-requests-per-second=N | unlimited
    --concurrent-reads=N | unlimited
    --concurrent-writes=N | unlimited
```

To change parameters of a running server use:

```
$ kopia server throttle set \
    --address=<server-url> \
    --server-control-password=<password> \
    --download-bytes-per-second=N | unlimited
    --upload-bytes-per-second=N | unlimited
    --read-requests-per-second=N | unlimited
    --write-requests-per-second=N | unlimited
    --list-requests-per-second=N | unlimited
    --concurrent-reads=N | unlimited
    --concurrent-writes=N | unlimited
```
2022-05-18 01:27:06 -07:00
Jarek Kowalski
a61927e089 chore(infra): added more leak checks to tests (#1953) 2022-05-16 06:37:57 +00:00
Jarek Kowalski
1ae6c6df03 fix(repository): fixed slow goroutine leak from indexBlobCache, added tests (#1950) 2022-05-16 01:21:30 +00:00
Jarek Kowalski
81c0580a01 feat(cli): REVERT added 'content delete --forget' flag (#1932) (#1940)
This reverts commit d990af4dc2.
2022-05-10 03:45:25 +00:00
Jarek Kowalski
d990af4dc2 feat(cli): added 'content delete --forget' flag (#1932)
* feat(cli): added 'content delete --forget' flag

This allows low-level hiding of entries in the index, which makes
them completely invisible.

For #1906

* improved code coverage

* pr feedback
2022-05-06 21:16:12 -07:00
Jarek Kowalski
98f3473b67 refactor(snapshots): extracted snapshotfs.Verifier component (#1921)
* refactor(snapshots): extracted snapshotfs.Verifier component

* refactor(repository): added tests for snapshotfs.Verifier, misc cleanups

* fixed data race

* fixed atomic alignment

* nit
2022-05-02 04:03:28 +00:00
Jarek Kowalski
5d87d81733 chore(repository): extracted content index building and parsing into repo/content/index (#1881) 2022-04-05 18:04:50 -07:00
Ali Dowair
044792db30 feat(cli): show storage capacity in repo status (#1867)
The connected repository's backing storage capacity and available
space can be now retrieved from `kopia repository status`. In text
format, these fields are printed in a human friendly form (MiB, GiB).
In JSON mode (`--json`), they are output as bytes.

Co-authored-by: Shikhar Mall <mall.shikhar.in@gmail.com>
Co-authored-by: Julio
2022-04-01 01:03:51 +00:00
Jarek Kowalski
92683c5a5e feat(snapshots): support for controlling upload parallelism via policies (#1850) 2022-03-26 19:20:49 +00:00
Jarek Kowalski
d7e10dba59 feat(cli): improved kopia benchmark commands (#1849)
* added --parallel flag
* added `kopia benchmark encryption`
* added `kopia benchmark hashing`
2022-03-26 14:28:08 +00:00
Ali Dowair
aafe56cd6f feat(snapshots): support restoring sparse files (#1823)
* feat(snapshots): support restoring sparse files

This commit implements basic support for restoring sparse files from
a snapshot. When specifying "--mode=sparse" in a snapshot restore
command, Kopia will make a best effort to make sure the underlying
filesystem allocates the minimum amount of blocks needed to persist
restored files. In other words, enabling this feature will "force"
all restored files to be sparse-blocks of zero bytes in the source
file should not be allocated.

* Address review comments

- Separate sparse option into its own bool flag
- Implement sparsefile packagewith copySparse method
- Truncate once before writing sparse file
- Check error from Truncate
- Add unit test for copySparse
- Invoke GetBlockSize once per file copy
- Remove support for Windows and explain why
- Add unit test for stat package

Co-authored-by: Dave Smith-Uchida <dave@kasten.io>
2022-03-22 19:09:50 -07:00
Jarek Kowalski
3ef472248d fix(ci): fixed flaky TestServerControl (#1840) 2022-03-20 20:19:21 -07:00
Jarek Kowalski
daa62de3e4 chore(ci): added checklocks static analyzer (#1838)
From https://github.com/google/gvisor/tree/master/tools/checklocks

This will perform static verification that we're using
`sync.Mutex`, `sync.RWMutex` and `atomic` correctly to guard access
to certain fields.

This was mostly just a matter of adding annotations to indicate which
fields are guarded by which mutex.

In a handful of places the code had to be refactored to allow static
analyzer to do its job better or to not be confused by some
constructs.

In one place this actually uncovered a bug where a function was not
releasing a lock properly in an error case.

The check is part of `make lint` but can also be invoked by
`make check-locks`.
2022-03-19 22:42:59 -07:00
Jarek Kowalski
9054fd0dc2 chore(ci): upgraded linter to 1.45 (#1836)
* chore(deps): upgraded linter to 1.45

* fixed linter warning
2022-03-18 22:24:42 -07:00
Jarek Kowalski
991c08e4b4 chore(repository): switched from opencensus to directly exporting prometheus metrics (#1831) 2022-03-17 23:39:36 -07:00
Julio Lopez
3f9e27c97b feat(cli): add --json output to 'repo status' command (#1834)
* cleanup(repository) add kopia:sensitive tag to content.FormattingOptions field

* feat(cli): add --json output to  command
2022-03-17 22:22:24 -07:00
Jarek Kowalski
2b8f7453db fix(cli): fixed and unified help text for policy commands (#1829)
Fixes #1822
2022-03-16 00:46:57 -07:00
Jarek Kowalski
c4ab086d5f chore(repository): streamlined internal content cache API (#1828)
moved from ./repo/content to ./internal/cache
2022-03-15 23:57:04 -07:00
Jarek Kowalski
69dc7ba969 feat(repository): added 'hint' to Prefetch methods. (#1825) 2022-03-12 23:16:39 -08:00
Jarek Kowalski
369d304084 refactor(repository): better context cancelation handling (#1802)
Instead of ignoring context cancelation in Open(), ensure we don't
spawn goroutines that might be canceled.
2022-03-06 16:56:30 -08:00
Jarek Kowalski
926e14aacb feat(repository): added PrefetchObjects() API (#1779)
* feat(repository): added precaching of data blobs

* feat(repository): added utilities for converting ID slices to strings

* feat(repository): added object.PrefetchBackingContents

* feat(repository): implemented Repository.PrefetchObjects

* feat(cli): added 'cache prefetch' subcommand

* feat(repository): prefetch in parallel

* added tests
2022-03-06 14:30:58 -08:00
Jarek Kowalski
9d63e56bb9 feat(cli): improved formatting of 'policy show' outputs (#1767) 2022-02-22 22:21:48 -08:00
Jarek Kowalski
6d29b2a082 feat(cli): added snapshot list flags: --storage-stats and --reverse (#1766)
For #1722
Related #1755
2022-02-21 20:24:13 -08:00
Jarek Kowalski
f1d3130351 refactor(repository): expose ContentInfo() on repo.Repository (#1765) 2022-02-21 14:38:59 -08:00
Jarek Kowalski
f5cee5ae42 refactor(snapshots): refactored TreeWalker to use workshare (#1762)
Also cleaned up the API and added unit tests.
2022-02-20 16:58:53 -08:00
Jarek Kowalski
a48e24e693 feat(providers): add Google Drive support (#1731)
* feat(provider): Add Google Drive support.

Co-authored-by: xkxx <xkxx@users.noreply.github.com>
Co-authored-by: xkxx <xkxiang@gmail.com>
2022-02-16 22:34:48 -08:00
Jarek Kowalski
90df511609 fix(snapshots): treat empty retention policy as retaining ALL, not NONE (#1733)
This is a safety measure which addresses P0 improvement for #1732.

Given that retention policies that retain nothing make no sense, this
is not considered a breaking change.
2022-02-07 11:40:27 -08:00
Shikhar Mall
63bedd3446 feat(cli): allow changing retention parameters from CLI (#1680)
Co-authored-by: Shikhar Mall <small@kopia.io>
2022-02-02 19:04:22 -08:00
Jarek Kowalski
e15852af41 test(cli): fixed test flake in TestServerControl (#1717) 2022-02-02 18:44:48 -08:00
Shikhar Mall
aa5e4cfb33 refactor(cli): An in-memory storage mock setup for CLI tests (#1697)
* refactor cli tests to allow the use of in-memory mock

* use in-memory repo for set-parameters cli tests

* move inmemory storage provider into test package

Co-authored-by: Shikhar Mall <shikhar@kasten.io>
2022-02-01 10:29:13 -08:00
Jarek Kowalski
fd163cfc20 feat(kopiaui): connect to repository asynchronously on startup (#1691)
This allows KopiaUI server to start when the repository directory
is not mounted or otherwise unavailable. Connection attempts will
be retried indefinitely and user will see new `Initializing` page.

This also exposes `Open` and `Connect` as tasks allowing the user to see
logs directly in the UI and cancel the operation.
2022-01-29 18:28:52 -08:00
Jarek Kowalski
f67274e229 fix(providers): fixed DoNotRecreate and tests for gcs (#1688)
Also simplified validation test suite, which will simply test whether
the provider supports DoNotRecreate or properly rejects it without
external configuration.
2022-01-29 09:12:07 -08:00
Jarek Kowalski
9cad0edb53 test(ui): added end-to-end HTML UI test (#1686)
* test(general): refactored parsing of server output

* test(ui): added experimental end-to-end test using chromedp
2022-01-29 01:34:45 -08:00
Ali Dowair
7ca8b85a57 feat(providers): expand PutBlob API to allow for idempotent puts (#1654)
* Add a new PutBlob option and blob error type

When `DoNotRecreate` is set as true, the blob put operation should
only succeed if no blob with the given blob ID already exists.
Othwerwise, `ErrBlobAlreadyExists` is returned.

* Validate default storage providers' support

By default, storage providers should not support idempotent creates.
This commit adds error handling to exit early if `DoNotRecreate` is
set to true. The commit also verifies this behavior in the provider
validation test.

* Implement support for new option in GCS storage

* Push PutBlob option handling down to Impl

When PutBlob options were introduced, error handling logic for them
was implemented for the Sharded storage interface. However, the
behavior of different providers that implement Sharded can be
different, so it's better to push the options down to be processed in
the provider implementations.

* Introduce new error type for unsupported put opts

To unify error handling code and make it more maintainable, introduce
a new error type `blob.ErrUnsupportedPutBlobOption`, which is to be
returned whenever a storage provider implementation is given put
options it does not support.
2022-01-27 08:49:06 -08:00