Commit Graph

39 Commits

Author SHA1 Message Date
Prasad Ghangal
3bf947d746 feat(repository): Metadata compression config support for directory and indirect content (#4080)
* Configure compressor for k and x prefixed content

Adds metadata compression setting to policy
Add support to configure compressor for k and x prefixed content
Set zstd-fastest as the default compressor for metadata in the policy
Adds support to set and show metadata compression to kopia policy commands
Adds metadata compression config to dir writer

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Pass concatenate options with ConcatenateOptions struct

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Move content compression handling to caller

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Move handling manifests to manifest pkg

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Correct const in server_test

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Remove unnecessary whitespace

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Disable metadata compression for < V2 format

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

---------

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
2024-10-23 23:28:23 -07:00
Jarek Kowalski
e36fa78385 feat(snapshots): added support for per-directory splitter overrides (#3887)
This is useful when backing up directories that have giant files aligned
at MiB boundary, such as VM disk backups, etc.
2024-06-07 13:42:15 -07:00
Jarek Kowalski
d0fc1e03c4 fix(server): do not make blocking calls inside server status API (#3666)
also reduce global server lock scope
2024-02-21 12:34:16 -08:00
Jarek Kowalski
524ffaf4b8 refactor(repository): added context to potentially blocking repository methods (#3654)
Primarily for wiring a context.Context to a call to content.Manager.refresh,
which was using a detached context.
2024-02-20 14:48:23 -08:00
Jarek Kowalski
51dcaa985d chore(ci): upgraded linter to 1.48.0 (#2294)
Mechanically fixed all issues, added `lint-fix` make target.
2022-08-09 06:07:54 +00:00
Jarek Kowalski
23299c3451 refactor(repository): ensure MutableParameters are never cached (#2284) 2022-08-06 18:11:32 -07:00
Jarek Kowalski
68b8afd43f feat(snapshots): improved performance when uploading huge files (#2064)
* feat(snapshots): improved performance when uploading huge files

This is controlled by an upload policy which specifies the size
threshold above which indvidual files are uploaded in parts
and concatenated.

This allows multiple threads to run splitting, hashing, compression
and encryption in parallel, which was previously only possible across
multiple files, but not when a single file was being uploaded.

The default is 2GiB for now, so this feature only kicks in for very
larger files. In the future we may lower this.

Benchmark involved uploading a single 42.1 GB file which was a VM disk
snapshot of fresh Ubuntu installation (fresh EXT4 partition with lots
of zero bytes) to a brand-new filesystem repository on local SSD of
M1 Pro Macbook Pro 2021.

* before: 59-63s (~700 MB/s)
* after: 15-17s  (~2.6 GB/s)

* additional test to ensure files are really e2e readable
2022-06-24 07:38:07 +00:00
Jarek Kowalski
9bf9cac7fb refactor(repository): ensure we always parse content.ID and object.ID (#1960)
* refactor(repository): ensure we always parse content.ID and object.ID

This changes the types to be incompatible with string to prevent direct
conversion to and from string.

This has the additional benefit of reducing number of memory allocations
and bytes for all IDs.

content.ID went from 2 allocations to 1:
   typical case 32 characters + 16 bytes per-string overhead
   worst-case 65 characters + 16 bytes per-string overhead
   now: 34 bytes

object.ID went from 2 allocations to 1:
   typical case 32 characters + 16 bytes per-string overhead
   worst-case 65 characters + 16 bytes per-string overhead
   now: 36 bytes

* move index.{ID,IDRange} methods to separate files

* replaced index.IDFromHash with content.IDFromHash externally

* minor tweaks and additional tests

* Update repo/content/index/id_test.go

Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>

* Update repo/content/index/id_test.go

Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>

* pr feedback

* post-merge fixes

* pr feedback

* pr feedback

* fixed subtle regression in sortedContents()

This was actually not producing invalid results because of how base36
works, just not sorting as efficiently as it could.

Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
2022-05-25 14:15:56 +00:00
Jarek Kowalski
daa62de3e4 chore(ci): added checklocks static analyzer (#1838)
From https://github.com/google/gvisor/tree/master/tools/checklocks

This will perform static verification that we're using
`sync.Mutex`, `sync.RWMutex` and `atomic` correctly to guard access
to certain fields.

This was mostly just a matter of adding annotations to indicate which
fields are guarded by which mutex.

In a handful of places the code had to be refactored to allow static
analyzer to do its job better or to not be confused by some
constructs.

In one place this actually uncovered a bug where a function was not
releasing a lock properly in an error case.

The check is part of `make lint` but can also be invoked by
`make check-locks`.
2022-03-19 22:42:59 -07:00
Jarek Kowalski
e67f84e0ba chore(general): updated linter to 1.44.0 (#1681) 2022-01-25 21:21:13 -08:00
Jarek Kowalski
bbbef44d8a More coverage improvements (#1577)
* increased direct coverage for internal/cache

* object: code coverage improvements for object writer
2021-12-11 23:27:42 -08:00
Jarek Kowalski
8b760b66a8 logging: added memoization of Logger instances per context (#1369) 2021-10-09 05:02:18 -07:00
Jarek Kowalski
792cc874dc repo: allow reusing of object writer buffers (#1315)
This reduces memory consumption and speeds up backups.

1. Backing up kopia repository (3.5 GB files:133102 dirs:20074):

before: 25s, 490 MB
after: 21s, 445 MB

2. Large files (14.8 GB, 76 files)

before: 30s, 597 MB
after: 28s, 495 MB

All tests repeated 5 times for clean local filesystem repo.
2021-09-25 14:54:31 -07:00
Jarek Kowalski
35d0f31c0d huge: replaced the use of allocated byte slices with populating gather.WriteBuffer in the repository (#1244)
This helps recycle buffers more efficiently during snapshots.
Also, improved memory tracking, enabled profiling flags and added pprof
by default.
2021-08-20 08:45:10 -07:00
Jarek Kowalski
40510c043d Support for content-level compression (#1076)
* cli: added a flag to create repository with v2 index features

* content: plumb through compression.ID parameter to content.Manager.WriteContent()

* content: expose content.Manager.SupportsContentCompression

This allows object manager to decide whether to create compressed object
or let the content manager do it.

* object: if compression is requested and the repo supports it, pass compression ID to the content manager

* cli: show compression status in 'repository status'

* cli: output compression information in 'content list' and 'content stats'

* content: compression and decompression support

* content: unit tests for compression

* object: compression tests

* testing: added integration tests against v2 index

* testing: run all e2e tests with and without content-level compression

* htmlui: added UI for specifying index format on creation

* cli: additional tests for 'content ls' and 'content stats'

* applied pr suggestions
2021-05-22 05:35:27 -07:00
Jarek Kowalski
30ca3e2e6c Upgraded linter to 1.40.1 (#1072)
* tools: upgraded linter to 1.40.1

* lint: fixed nolintlint vionlations

* lint: disabled tagliatele linter

* lint: fixed remaining warnings
2021-05-15 12:12:34 -07:00
Jarek Kowalski
f4347886b8 logging: simplified log levels (#954)
Removed Warning, Notify and Fatal:

* `Warning` => `Error` or `Info`
* `Notify` => `Info`
* `Fatal` was never used.

Note that --log-level=warning is still supported for backwards
compatibility, but it is the same as --log-level=error.

Co-authored-by: Julio López <julio+gh@kasten.io>
2021-04-09 07:27:35 -07:00
Jarek Kowalski
66cebb79cb Fixed empty object IDs in checkpoints (#649)
* object: fixed race condition between Result() and Checkpoint()

This would sometimes result in indirect objects having empty object IDs.

Fixes #648

* upload: ensure checkpoints never containt empty object IDs.

* testing: reduce armhf test weight
2020-09-29 07:14:47 -07:00
Jarek Kowalski
f0b97b960b Fixed checkpointing to not restart the entire upload process (#594)
* object: added Checkpoint() method to object writer

* upload: refactored code structure to allow better checkpointing

* upload: removed Checkpoint() method from UploadProgress

* Update fs/entry.go

Co-authored-by: Julio López <julio+gh@kasten.io>
2020-09-12 22:36:22 -07:00
Jarek Kowalski
e22d22dba2 object: implemented fast concatenation of objects by merging their index entries (#607) 2020-09-11 20:12:01 -07:00
Jarek Kowalski
faf280616a Splitter throughput improvements (#606)
* object: refactored writer to detect split points before writing

This introduces new primitive that will be moved into splitters
themselves in subsequent commits. I'm doing this in small steps to
ensure we don't regress at any time.

* splitter: refactored TestSplitters test

This is use slow (byte-by-byte) and fast (nextSplitPoint) methods of
determining split points.

Note nextSplitPoint is not implemented by splitters yet, but this
verifies that the test is expecting the right thing.

* object: splitter refactoring - replaced ShouldSplit() with NextSplitPoint() everywhere, still not optimized

* splitter: added additional dimension to splitter_test

We split either in large chunks or one byte at a time to catch
the corner cases in the splitter implementation.

* splitter: optimized splitters using NextSplitPoint primitive

This improves splitter performance by about 40% (buzhash) and makes
it virtually free for FIXED splitter.
2020-09-11 19:45:48 -07:00
Jarek Kowalski
9a6dea898b Linter upgrade to v1.30.0 (#526)
* fixed godot linter errors
* reformatted source with gofumpt
* disabled some linters
* fixed nolintlint warnings
* fixed gci warnings
* lint: fixed 'nestif' warnings
* lint: fixed 'exhaustive' warnings
* lint: fixed 'gocritic' warnings
* lint: fixed 'noctx' warnings
* lint: fixed 'wsl' warnings
* lint: fixed 'goerr113' warnings
* lint: fixed 'gosec' warnings
* lint: upgraded linter to 1.30.0
* lint: more 'exhaustive' warnings

Co-authored-by: Nick <nick@kasten.io>
2020-08-12 19:28:53 -07:00
Jarek Kowalski
573d10422a object: ensure that all I objects have a content prefix
When prefix is not specified on ObjectWriter, we force
'x' content prefix on intermediate contents, so object IDs
will look like:

Ix{hash}

This ensures the index contents will be stored in `q` blobs,
making `snapshot gc` easier.
2020-04-12 23:55:09 -07:00
Jarek Kowalski
8687f1c008 object: added AsyncWrites to ObjectWriter, which improves performance… (#369)
* object: added AsyncWrites to ObjectWriter, which improves performance of uploading of a single file

Fixes #351

Co-Authored-By: Julio López <julio+gh@kasten.io>
2020-03-22 09:02:33 -07:00
Jarek Kowalski
239d809075 performance: introduced buf.Pool which helps reuse memory buffers (#345)
* performance: added buf.Pool which can be used to manage ephemeral buffers for encryption and compression
* repo: switched object writer to buf.Pool
* content: switched encryption to use buf.Pool
* object: switched compression to use buf.Pool
* testing: added missing content manager Close()
2020-03-18 20:42:16 -07:00
Jarek Kowalski
8d452a8285 performance: improvements to object manager (#336)
- added pooled splitters and ability to reset them without having to recreate
- added support for caller-provided compressor output to be able to pool it
- added pooling of compressor instances, since those are costly
2020-03-13 08:56:18 -07:00
Jarek Kowalski
d181403284 crypto: refactored encryption, hashing and splitter into separate packages (#274)
Added some tests, deleted XSALSA20 which never worked E2E
2020-02-27 12:36:49 -08:00
Jarek Kowalski
0b8c4d0ef9 object: fixed compression bug where we were not clearing the buffer
this effectively defeated the purpose of compression, caused high
memory usage and other kinds of bad behavior.

refactored the code to prevent this issue by resetting the buffer
at the caller not callee.

fixed previous e2e test to catch the issue mentioned in #166,
verified it fails against master and passes with this change.
2020-01-09 16:36:57 -08:00
Jarek Kowalski
ac70a38101 lint: upgraded to 1.22.2 and make lint issues a build failure
fixed or silenced linter warnings, mostly due to magic numeric constants
2020-01-03 16:39:30 -08:00
Jarek Kowalski
2ba4e83cef moved all compression to separate package and sanitized identifiers 2019-12-10 23:25:28 -08:00
Jarek Kowalski
aec3cdcb2f object: added support for compressed objects 2019-12-10 23:25:28 -08:00
Jarek Kowalski
6217df1a87 lint: switched to 1.21 and fixed a ton of whitespace issues discovered
by new wsl linter
2019-11-26 06:49:49 -08:00
Jarek Kowalski
54edb97b3a refactoring: renamed repo/block to repo/content
Also introduced strongly typed content.ID and manifest.ID (instead of string)

This aligns identifiers across all layers of repository:

blob.ID
content.ID
object.ID
manifest.ID
2019-06-01 22:24:19 -07:00
Jarek Kowalski
63303904e1 switched remaining fmt.Errorf to errors.Wrap() 2019-06-01 10:57:05 -07:00
Jarek Kowalski
03339c18af [breaking change] deprecated DYNAMIC splitter due to license issue
The splitter in question was depending on
github.com/silvasur/buzhash which is not licensed according to FOSSA bot

Switched to new faster implementation of buzhash, which is
unfortunately incompatible and will split the objects in different
places.

This change is be semi-breaking - old repositories can be read, but
when uploading large objects they will be re-uploaded where previously
they would be de-duped.

Also added 'benchmark splitters' subcommand and moved 'block cryptobenchmark'
subcommand to 'benchmark crypto'.
2019-05-30 22:20:45 -07:00
Jarek Kowalski
0c41d41276 Fixed up paths after merge 2019-05-27 15:48:39 -07:00
Jarek Kowalski
327d8317d8 refactored repo/ into separate github.com/kopia/repo/ git repository 2018-10-26 20:40:57 -07:00
Jarek Kowalski
1b014c875a simplified repository API password handling.
completely rewrote password storage:

- by default passwords are kept in OS-specific keyring (Keychain on macOS,
Windows Credentials Manager on Windows), which can be optionally disabled
to store password in a local file.

- on Linux keychain is disabled by default (does not work reliably
in terminal sessions), but can be enabled using command-line flag.
2018-09-07 21:34:31 -07:00
Jarek Kowalski
91066f2469 reorganized low-level repository packages by moving them all under kopia/kopia/repo/ 2018-08-30 22:01:05 -07:00