Commit Graph

542 Commits

Author SHA1 Message Date
Jarek Kowalski
0554e2f7ce refactor(general): introduced generics to reduce boilerplate code (#2527)
This removes tons of boilerplate code around:

- retry loop
- connection management
- storage registration

* used generics in runInParallel
* introduced generics in freepool
* introduced strong typing for workshare.Pool and workshare.AsyncGroup
* fixed linter error on openbsd
2022-10-29 01:56:51 +00:00
Jarek Kowalski
f69424961f chore(ci): upgrade golang to 1.19.2 and linter to 1.50.1 (#2526)
Lack of generics support is blocking various dependency upgrades,
so this unblocks that.

Temporarily disabled `checklocks` linter until it is fixed upstream.
2022-10-28 11:02:47 -07:00
atom
c5efed01f4 feat(cli): Support displaying storage values in base-2 [#2492] (#2502)
* Update display on repository summary

* Apply throughout app

* Situate units_test

* Update Command Line documentation

* Envar cleanup

* Rename to BytesString

* Restore envar string available for test

* Remove extraneous empty check and restore UIPreferences field for frontend

* PR: config bool cleanup and missed `BaseEnv`s

* Fix lint and test
2022-10-24 19:00:36 -07:00
Jarek Kowalski
c06d50d218 fix(ci): increased timeout in flaky TestSourceRefreshesAfterPolicy on Windows (#2476) 2022-10-02 03:36:23 +00:00
Jarek Kowalski
dcf6d1c2e1 feat(general): always try O_TMPFILE on Linux when creating temporary files (#2407) 2022-09-14 09:48:02 -07:00
Jarek Kowalski
33fa5cbd50 feat(general): added tempfile package (#2402)
* feat(general): added tempfile package

* use tempfile.Create() in bigmap
2022-09-14 05:37:47 +00:00
Jarek Kowalski
b9e9ef3b38 feat(repository): improve performance when snapshotting to a repository server (#2394)
Benchmarked from macOS client to a Linux server over Wifi connection:
(2-5ms latency)

linux 5.14.8 (1.1 GB) to a clean repository:

Before: 240s After: 27s (90% faster)

Fixes #2372
2022-09-11 07:43:34 -07:00
Jarek Kowalski
b38ace2513 feat(server): added tracing spans for gRPC server (#2393) 2022-09-11 07:13:59 +00:00
Jarek Kowalski
645e680a8f feat(general): reduce memory usage in maintenance, snapshot fix and verify (#2365) 2022-09-10 09:36:17 -07:00
Jarek Kowalski
28ce29eab4 feat(repository): added Set and Map backed by custom on-disk hashtable (#2364)
* feat(repository): added bigmap Set and Map backed by custom on-disk hashtable

* additional test coverage for corner cases

* profile flags

* exclude bigmapbench from code coverage

* refactored based on conversation with Julio

The concern was that values stored on disk are not encrypted.

* bigmap.Map - implements encryption of values
* bigmap.Set - stores keys only, so no encryption necessary
* bigmap.internalMap - used as backing structure for Map and Set

* implemented benchmark with values
2022-09-10 15:47:53 +00:00
Jarek Kowalski
7bda16ab33 feat(repository): introduced fs.UTCTimestamp (#2343)
Fixes #2342
2022-09-02 10:35:59 -07:00
ashmrtn
5c88bcf1a6 feat(snapshots): Callback for when uploader finishes processing a file (#2331)
* Make callback for upload file completion

Callback does not indicate that a file will be reachable immediately in
the resulting snapshot, but does indicate that the uploader is done
processing the file in some way (either via uploading data or finding a
previous version in the repo) and whether there was an error processing
the file.

* Tests for new FinishedFile callback

Ensure hadErr is properly populated and FinishedFile is called even if
the file was considered cached.

* Refine comment on interface function slightly

* Give callback error instead of bool about error

* Add locks around concurrent accesses in test
2022-08-22 20:42:27 +01:00
Ricardo Pescuma Domenecci
04f54000a6 feat(server): Added ECC to server api (#2314)
* feat(server): Added ECC to server api

* feat(server): Add format version to repo status api

* Change requested in PR

* Updated htmlui
2022-08-19 08:24:04 -07:00
Jarek Kowalski
4d55f91070 chore(ci): disable long file names in GetInterestingTempDirectoryName() when ENABLE_LONG_FILENAMES=false (#2295) 2022-08-09 06:55:30 +00:00
Jarek Kowalski
51dcaa985d chore(ci): upgraded linter to 1.48.0 (#2294)
Mechanically fixed all issues, added `lint-fix` make target.
2022-08-09 06:07:54 +00:00
Jarek Kowalski
419c7acb11 fix(repository): fixed V1 key derivation bug from previous refactoring (#2286)
See 23299c3451
2022-08-08 21:45:08 -07:00
Jarek Kowalski
23299c3451 refactor(repository): ensure MutableParameters are never cached (#2284) 2022-08-06 18:11:32 -07:00
Jarek Kowalski
6160ee5668 refactor(repository): moved format blob management to separate package (#2245)
* refactor(repository): moved format blob management to separate package

This is completely mechanical, no behavior changes, only:

- moved types and functions to a new package
- adjusted visibility where needed
- added missing godoc
- renamed some identifiers to align with current usage
- mechanically converted some top-level functions into member functions
- fixed some mis-named variables

* refactor(repository): moved content.FormatingOptions to format.ContentFormat
2022-07-30 14:13:52 -07:00
Jarek Kowalski
b9be9632a2 feat(repository): added required features to the repository (#2220)
* feat(repository): added `required features` to the repository

This is intended for future compatibility to be able to reliably
stop old kopia client from being able to open a repository when
the old code does not understand new `required feature`.

Required features are checked on startup and periodically using the
same method as upgrade lock, where they will return errors during blob
operations.

* pr feedback
2022-07-29 09:31:17 -07:00
Jarek Kowalski
56f3046d8a refactor(repository): introduce interface for reading FormattingOptions (#2235)
Instead of passing static content.FormattingOptions (and caching it)
we now introduce an interface to provide its values.

This will allow the values to dynamically change at runtime in the
future to support cases like live migration.
2022-07-28 17:27:04 -07:00
Shikhar Mall
26e6f59b2b feat(cli): New Upgrade CLI / Switch to Format Version 3 (upgrade coordination) (#1818)
* kopia format upgrade lock

* Update cli/command_repository_set_parameters_test.go

Co-authored-by: Ali Dowair <adowair@umich.edu>

* Update cli/command_repository_upgrade.go

Co-authored-by: Ali Dowair <adowair@umich.edu>

* Update cli/command_repository_upgrade.go

Co-authored-by: Ali Dowair <adowair@umich.edu>

* pr feedback

* pr feedback

* add a min drain time check

* env var for io-drain-timeout

* fix: add more doctext around upgrade phases

* build: wrap with EnvName

* add experimental warning

* protect upgrade cli behind env varible

* fix conflicts after relocating the upgrade lock

* generalize the command args

* drop certain features as per feedback

* sub-divide the upgrade command into begin and rollback

* Update cli/command_repository_upgrade.go

Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>

* Update cli/command_repository_upgrade.go

Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>

* missing return

* rename force flag to allow-unsafe-upgrade

Co-authored-by: Shikhar Mall <shikhar@kasten.io>
Co-authored-by: Ali Dowair <adowair@umich.edu>
Co-authored-by: Shikhar Mall <small@kopia.io>
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
2022-07-27 16:23:45 -07:00
Jarek Kowalski
1a82061e49 chore(ci): upgraded linter to 1.47.0, added 15s ReadHeaderTimeout in web server (#2206) 2022-07-18 22:47:36 -07:00
Jarek Kowalski
10a693ade3 fix(cli): removed unnecessary console log timestamps (#2175)
Improved test logic which missed this.
2022-07-10 23:06:46 -07:00
Jarek Kowalski
27e5d119df fix(repository): fixed panic when content cache has been disabled (rare) (#2176) 2022-07-11 05:42:49 +00:00
Jarek Kowalski
ea257b1597 feat(cli): removed unnecessary logs from cli-logs (#2174)
- removed memory tracking since it's redundant with profiling
  and prometheus support.
- various cleanups to make sure default log is clean
2022-07-10 16:25:25 -07:00
Jarek Kowalski
8515d050e5 test(infra): improved support for in-process testing (#2169)
* feat(infra): improved support for in-process testing

* support for killing of a running server using simulated Ctrl-C
* support for overriding os.Stdin
* migrated many tests from the exe runner to in-process runner

* added required indirection when defining Envar() so we can later override it in tests

* refactored CLI runners by moving environment overrides to CLITestEnv
2022-07-09 18:22:50 -07:00
Jarek Kowalski
a621cd3fb6 fix(ui): fixed filesysystem restores triggered from UI (#2163)
Added comprehensive test for restore API which was previously completely
uncovered.

Fixes #2162
2022-07-08 23:50:17 -07:00
Jarek Kowalski
aec08c84c2 fix(snapshots): panic: unaligned 64-bit atomic operation (#2151)
Reverted to regular sync.Pool

Fixes #2150
2022-07-06 18:34:25 -07:00
Jarek Kowalski
a8717bbcd2 fix(ui): fixed scheduling of snapshots with Ignoring Identical Snapshots (#2141)
Related #2036
Fixes #2138
2022-07-04 22:58:20 -07:00
Jarek Kowalski
0985b80488 feat(ui): support for deprecation of certain algorithms (#2122)
Some compression algorithms are not recommended because they
allocate disproportionate amounts of memory. They are still
possible to use, just marked as NOT RECOMMENDED in the UI.
2022-07-03 19:06:14 +00:00
Jarek Kowalski
34da3a2415 feat(general): implemented custom log encoder for ZAP (#2116)
Turns out standard ConsoleEncoder is not optimized at all and we're
emitting up to 8 log entries per file. This change avoids massive
amount of allocations and brings in some latency reduction.

Backing up 100k files in a flat directory:

duration: current:7.8 baseline:7.9 change:-1.0 %
repo_size: current:1019954521.4 baseline:1019976318.9 change:-0.0 %
num_files: current:58.0 baseline:58.0 change:0%
avg_heap_objects: current:6682141.0 baseline:7508631.3 change:-11.0 %
avg_heap_bytes: current:845404672.0 baseline:867325413.9 change:-2.5 %
avg_ram: current:156.5 baseline:159.1 change:-1.7 %
max_ram: current:278.3 baseline:287.2 change:-3.1 %
avg_cpu: current:153.8 baseline:156.4 change:-1.7 %
max_cpu: current:298.4 baseline:297.1 change:+0.4 %

Backing up Linux 5.18.4:

duration: current:3.6 baseline:4.2 change:-14.2 %
repo_size: current:1081624213.7 baseline:1081635886.8 change:-0.0 %
num_files: current:60.0 baseline:60.0 change:0%
avg_heap_objects: current:5180192.3 baseline:5831270.7 change:-11.2 %
avg_heap_bytes: current:783468754.2 baseline:804350042.1 change:-2.6 %
avg_ram: current:239.0 baseline:240.6 change:-0.6 %
max_ram: current:384.8 baseline:368.0 change:+4.6 %
avg_cpu: current:259.8 baseline:230.8 change:+12.6 %
max_cpu: current:321.6 baseline:303.9 change:+5.9 %
2022-07-02 13:55:01 +00:00
Jarek Kowalski
0b6d76712f feat(snapshots): added free pool of localfs entries (#2115)
This avoids allocations of entries as large directories are traversed.

Backing up 100k small files in a single directory:

duration: current:8.1 baseline:8.6 change:-5.7 %
avg_heap_objects: current:7330688.9 baseline:7463750.6 change:-1.8 %
avg_heap_bytes: current:829248496.6 baseline:864880473.2 change:-4.1 %
avg_ram: current:159.3 baseline:159.1 change:+0.1 %
max_ram: current:283.2 baseline:285.4 change:-0.8 %
avg_cpu: current:153.1 baseline:143.6 change:+6.7 %
max_cpu: current:295.6 baseline:292.4 change:+1.1 %

Backing up Linux 5.14.8 using --parallel=4:

duration: current:6.0 baseline:6.0 change:-0.0 %
avg_heap_objects: current:5654431.1 baseline:5845078.3 change:-3.3 %
avg_heap_bytes: current:764526988.2 baseline:806833320.4 change:-5.2 %
avg_ram: current:218.2 baseline:220.1 change:-0.9 %
max_ram: current:323.5 baseline:319.5 change:+1.3 %
avg_cpu: current:144.3 baseline:145.1 change:-0.6 %
max_cpu: current:224.6 baseline:220.6 change:+1.8 %
2022-07-01 21:12:55 -07:00
Jarek Kowalski
3462c269c1 feat(snapshots): added fs.Entry.Close which can be used to release any resources (#2098)
This is not used yet, but will be used to avoid allocation in
performance-critical portions of the upload.
2022-06-29 07:09:33 +00:00
Jarek Kowalski
70e24106ee refactor(general): unified logging.Logger with *zap.SugaredLogger (#2090)
- removed a bunch of hacks and should improve the logging
performance by avoiding interfaces and data translation. This will
allow using of de-sugared loggers in performance-critical
logging situations.

- this will also allow using features of ZAP more directly without
having to reimplement them.

- moved logging.Printf() to testlogging

- refactored `uitask` to store logs in a structural format and
present them as JSON only in the UI

- renamed printf_logger.go to printf.go so that fewer columns are used
in the logs
2022-06-26 05:11:52 +00:00
Philipp Matthaeus
fdbe2f836b feat(ui): Save page size (#2080)
* Added PageSize to UIPreferences type

* Update serverapi.go

* fixed gofmt check

* fixed gofmt check for real

Co-authored-by: Jarek Kowalski <jaak@jkowalski.net>
2022-06-24 06:34:56 -07:00
ashmrtn
61e651d30c feat(snapshots): Allow users to dynamically create entries in a directory during an upload (#1996)
* Allow dynamic directory entries with virtualfs

* Tests for new virtualfs implementation

* Add escape hatch for estimator during upload

Some virtualfs.StreamingDirectory-s may not be able to (efficiently)
support iterating through entries multiple times. Make a way for the
estimator to ask if they support multiple iterations and skip the
directory if they do not.

* Exapand Directory interface

Expand the Directory interface instead of making a new interface as it's
error-prone to ensure all wrapper types properly handle types that use
the new interface.

* Post-rebase fixes

* Make StreamingDirectory single iteration only

Simplify code and test slightly by not allowing users to declare a
StreamingDirectory that can be iterated through multiple times.

* Add better test for estimator ignoring stream dir

Previous test in uploader had a race condition, meaning it may not catch
all cases.

* Ignore atomic access in checklocks

Comparisons known to be done after all additions to the variables in
question.

* Implement reviewer feedback

* Remove unused function parameter
2022-06-14 19:08:49 -07:00
Ali Dowair
eddde91f2d chore(snapshots): unify sparse and normal FS output paths (#1981)
* Unify sparse and normal IO output

This commit refactors the code paths that excercise normal and sparse
writing of restored content. The goal is to expose sparsefile.Copy()
and iocopy.Copy() to be interchangeable, thereby allowing us to wrap
or transform their behavior more easily in the future.

* Introduce getStreamCopier()

* Pull ioCopy() into getStreamCopier()

* Fix small nit in E2E test

We should be getting the block size of the destination file, not
the source file.

* Call stat.GetBlockSize() once per FilesystemOutput

A tiny refactor to pull this call out of the generated stream copier,
as the block size should not change from one file to the next within
a restore entry.

NOTE: as a side effect, if block size could not be found (an error
is returned), we will return the default stream copier instead of
letting the sparse copier fail. A warning will be logged, but this
error will not cause the restore to fail; it will proceed silently.
2022-06-14 18:09:45 +00:00
basldfalksjdf
a78dacc0d1 feat(snapshots): add option to ignore empty snapshots being saved (#2036)
* Add policy to ignore empty snapshots being saved

* Update command_snapshot_create.go

* Update source_manager.go

* Update command_snapshot_create.go

* Update source_manager.go

* fixes

* Update source_manager.go

* Update command_snapshot_create.go

* fix

* fix

Co-authored-by: Mehak  Satija <mehaksatija@Mehaks-MacBook-Pro.local>
Co-authored-by: Mehak  Satija <mehaksatija@Mehaks-MBP.hitronhub.home>
2022-06-13 03:41:08 +00:00
Jarek Kowalski
544dd41e7f fix(snapshots): fixed random deadlock when Uploader results in a failure (#2020)
* fix(snapshots): fixed random deadlock when Uploader results in a failure

The deadlock was caused by not properly waiting for all asynchronous
work to complete before closing the worker pool.

Introduced `workshare.AsyncGroup.Close()` and some assertions.

* fixed select race

* linter fix

* pr feedback
2022-06-07 18:37:08 -07:00
ashmrtn
ef8828a072 refactor(snapshots): Remove remaining internal uses of Readdir (#1986)
* Remove remaining internal uses of Readdir

* Remove old helpers and interface functions.

* Update tests for updated fs.Directory interface

* Fix index out of range error in snapshot walker

Record one error if an error occurred and it's not limiting errors

* Use helper functions more; exit loops early

Follow up on reviewer comments and reduce code duplication, use more
targetted functions like Directory.Child, and exit directory iteration
early if possible.

* Remove fs.Entries type and unused functions

Leave some functions dealing with sorting and finding entries in fs
package. This retains tests for those functions while still allowing
mockfs to access them.

* Simplify function return
2022-06-04 06:36:25 -07:00
Jarek Kowalski
f5c64c8480 feat(snapshots): streaming upload support (#1963)
* feat(snapshots): switched repofs directory iteration to streaming

* feat(snapshots): switched ignorefs directory iteration to streaming

* feat(snapshots): switched mockfs iteration to streaming

* feat(snapshots): switched uploader to streaming mode

* fixed data race

* inlined foreachEntryUnlessCanceled
2022-05-28 13:20:40 -07:00
Jarek Kowalski
9bf9cac7fb refactor(repository): ensure we always parse content.ID and object.ID (#1960)
* refactor(repository): ensure we always parse content.ID and object.ID

This changes the types to be incompatible with string to prevent direct
conversion to and from string.

This has the additional benefit of reducing number of memory allocations
and bytes for all IDs.

content.ID went from 2 allocations to 1:
   typical case 32 characters + 16 bytes per-string overhead
   worst-case 65 characters + 16 bytes per-string overhead
   now: 34 bytes

object.ID went from 2 allocations to 1:
   typical case 32 characters + 16 bytes per-string overhead
   worst-case 65 characters + 16 bytes per-string overhead
   now: 36 bytes

* move index.{ID,IDRange} methods to separate files

* replaced index.IDFromHash with content.IDFromHash externally

* minor tweaks and additional tests

* Update repo/content/index/id_test.go

Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>

* Update repo/content/index/id_test.go

Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>

* pr feedback

* post-merge fixes

* pr feedback

* pr feedback

* fixed subtle regression in sortedContents()

This was actually not producing invalid results because of how base36
works, just not sorting as efficiently as it could.

Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
2022-05-25 14:15:56 +00:00
Jarek Kowalski
f49bcdd883 feat(cli): implementation for 'kopia snapshot fix' (#1930)
* feat(cli): implementation for 'kopia snapshot fix'

This allows modifications and fixes to the snapshots after they have
been taken.

Supported are:

* `kopia snapshot fix remove-invalid-files [--verify-files-percent=X]`

Removes all directory entries where the underlying files cannot be
read based on index analysis (this does not read the files, only index
structures so is reasonably quick).

`--verify-files-percent=100` can be used to trigger full read for
all files.

* `kopia snapshot fix remove-files --object-id=<object-id>`

Removes the object with a given ID from the entire snapshot tree.
Useful when you accidentally snapshot a sensitive file.

* `kopia snapshot fix remove-files --filename=<wildcard>`

Removes the files with a given name from the entire snapshot tree.
Useful when you accidentally snapshot a sensitive file.

By default all snapshots are analyzed and rewritten. To limit the scope
use:

--source=user@host:/path
--manifest-id=manifestID

By default the rewrite operation writes new directory entries but
does not replace the manifests. To do that pass `--commit`.

Related #1906
Fixes #799

reorganized CLI per PR suggestion

* additional logging for diff command

* added Clone() method to snapshot manifst and directory entry

* added a comprehensive test, moved DirRewriter  to separate file

* pr feedback

* more pr feedback

* improved logging output

* disable test in -race configuration since it's way to slow

* pr feedback
2022-05-25 01:17:55 +00:00
Julio Lopez
3d1de6f27a chore(general): minor cleanups (#1959)
- expand command flag description for clarification
- include blob id in blob get error in the cache
- nit: remove unused BOTO_PATH
- nit: fix comment
- cleanup: remove unnecessary function declaration in interface
- leverage 'testify' to simplify test
2022-05-23 15:16:25 -07:00
ashmrtn
9f85864da5 feat(snapshots): Add callback-based iteration function to Directory interface (#1957)
* New interface method to iterate over dir entries

* Fix build and test failures from interface

* Fix entry iteration for StaticDirectory

* Make utility function for directory iteration

* Fix lint errors

* No wrapcheck on fs.ReaddirToIterate

* Be consistent for IterateEntry implementations
2022-05-20 18:04:35 -07:00
Jarek Kowalski
99eeb3c063 feat(cli): added CLI for controlling throttler (#1956)
Supported are:

```
$ kopia throttle set \
    --download-bytes-per-second=N | unlimited
    --upload-bytes-per-second=N | unlimited
    --read-requests-per-second=N | unlimited
    --write-requests-per-second=N | unlimited
    --list-requests-per-second=N | unlimited
    --concurrent-reads=N | unlimited
    --concurrent-writes=N | unlimited
```

To change parameters of a running server use:

```
$ kopia server throttle set \
    --address=<server-url> \
    --server-control-password=<password> \
    --download-bytes-per-second=N | unlimited
    --upload-bytes-per-second=N | unlimited
    --read-requests-per-second=N | unlimited
    --write-requests-per-second=N | unlimited
    --list-requests-per-second=N | unlimited
    --concurrent-reads=N | unlimited
    --concurrent-writes=N | unlimited
```
2022-05-18 01:27:06 -07:00
Jarek Kowalski
a61927e089 chore(infra): added more leak checks to tests (#1953) 2022-05-16 06:37:57 +00:00
Jarek Kowalski
c971d7c4f8 feat(providers): ensure Ctrl-C is not passed to rclone (#1951)
Based on https://stackoverflow.com/questions/33165530/prevent-ctrlc-from-interrupting-exec-command-in-golang

Fixes #1934
2022-05-16 05:28:57 +00:00
Jarek Kowalski
1ae6c6df03 fix(repository): fixed slow goroutine leak from indexBlobCache, added tests (#1950) 2022-05-16 01:21:30 +00:00
Jarek Kowalski
98f3473b67 refactor(snapshots): extracted snapshotfs.Verifier component (#1921)
* refactor(snapshots): extracted snapshotfs.Verifier component

* refactor(repository): added tests for snapshotfs.Verifier, misc cleanups

* fixed data race

* fixed atomic alignment

* nit
2022-05-02 04:03:28 +00:00