* Configure compressor for k and x prefixed content
Adds metadata compression setting to policy
Add support to configure compressor for k and x prefixed content
Set zstd-fastest as the default compressor for metadata in the policy
Adds support to set and show metadata compression to kopia policy commands
Adds metadata compression config to dir writer
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
* Pass concatenate options with ConcatenateOptions struct
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
* Move content compression handling to caller
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
* Move handling manifests to manifest pkg
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
* Correct const in server_test
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
* Remove unnecessary whitespace
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
* Disable metadata compression for < V2 format
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
---------
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
* chore(ci): upgraded linter to 1.53.3
This flagged a bunch of unused parameters, so the PR is larger than
usual, but 99% mechanical.
* separate lint CI task
* run Lint in separate CI
This may be a breaking change for users who rely on particular kopia metrics (unlikely):
- introduced blob-level metrics:
* `kopia_blob_download_full_blob_bytes_total`
* `kopia_blob_download_partial_blob_bytes_total`
* `kopia_blob_upload_bytes_total`
* `kopia_blob_storage_latency_ms` - per-method latency distribution
* `kopia_blob_errors_total` - per-method error counter
- updated cache metrics to indicate particular cache
* `kopia_cache_hit_bytes_total{cache="CACHE_TYPE"}`
* `kopia_cache_hit_total{cache="CACHE_TYPE"}`
* `kopia_cache_malformed_total{cache="CACHE_TYPE"}`
* `kopia_cache_miss_total{cache="CACHE_TYPE"}`
* `kopia_cache_miss_errors_total{cache="CACHE_TYPE"}`
* `kopia_cache_miss_bytes_total{cache="CACHE_TYPE"}`
* `kopia_cache_store_errors_total{cache="CACHE_TYPE"}`
where `CACHE_TYPE` is one of `contents`, `metadata` or `index-blobs`
- reorganized and unified content-level metrics:
* `kopia_content_write_bytes_total`
* `kopia_content_write_duration_nanos_total`
* `kopia_content_compression_attempted_bytes_total`
* `kopia_content_compression_attempted_duration_nanos_total`
* `kopia_content_compression_savings_bytes_total`
* `kopia_content_compressible_bytes_total`
* `kopia_content_non_compressible_bytes_total`
* `kopia_content_after_compression_bytes_total`
* `kopia_content_decompressed_bytes_total`
* `kopia_content_decompressed_duration_nanos_total`
* `kopia_content_encrypted_bytes_total`
* `kopia_content_encrypted_duration_nanos_total`
* `kopia_content_hashed_bytes_total`
* `kopia_content_hashed_duration_nanos_total`
* `kopia_content_deduplicated_bytes_total`
* `kopia_content_read_bytes_total`
* `kopia_content_read_duration_nanos_total`
* `kopia_content_decrypted_bytes_total`
* `kopia_content_decrypted_duration_nanos_total`
* `kopia_content_uploaded_bytes_total`
Also introduced `internal/metrics` framework which constructs Prometheus metrics in a uniform way and will allow us to include some of these metrics in telemetry report in future PRs.
* refactor(repository): moved format blob management to separate package
This is completely mechanical, no behavior changes, only:
- moved types and functions to a new package
- adjusted visibility where needed
- added missing godoc
- renamed some identifiers to align with current usage
- mechanically converted some top-level functions into member functions
- fixed some mis-named variables
* refactor(repository): moved content.FormatingOptions to format.ContentFormat
* feat(snapshots): improved performance when uploading huge files
This is controlled by an upload policy which specifies the size
threshold above which indvidual files are uploaded in parts
and concatenated.
This allows multiple threads to run splitting, hashing, compression
and encryption in parallel, which was previously only possible across
multiple files, but not when a single file was being uploaded.
The default is 2GiB for now, so this feature only kicks in for very
larger files. In the future we may lower this.
Benchmark involved uploading a single 42.1 GB file which was a VM disk
snapshot of fresh Ubuntu installation (fresh EXT4 partition with lots
of zero bytes) to a brand-new filesystem repository on local SSD of
M1 Pro Macbook Pro 2021.
* before: 59-63s (~700 MB/s)
* after: 15-17s (~2.6 GB/s)
* additional test to ensure files are really e2e readable
* refactor(repository): ensure we always parse content.ID and object.ID
This changes the types to be incompatible with string to prevent direct
conversion to and from string.
This has the additional benefit of reducing number of memory allocations
and bytes for all IDs.
content.ID went from 2 allocations to 1:
typical case 32 characters + 16 bytes per-string overhead
worst-case 65 characters + 16 bytes per-string overhead
now: 34 bytes
object.ID went from 2 allocations to 1:
typical case 32 characters + 16 bytes per-string overhead
worst-case 65 characters + 16 bytes per-string overhead
now: 36 bytes
* move index.{ID,IDRange} methods to separate files
* replaced index.IDFromHash with content.IDFromHash externally
* minor tweaks and additional tests
* Update repo/content/index/id_test.go
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* Update repo/content/index/id_test.go
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* pr feedback
* post-merge fixes
* pr feedback
* pr feedback
* fixed subtle regression in sortedContents()
This was actually not producing invalid results because of how base36
works, just not sorting as efficiently as it could.
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* cli: added a flag to create repository with v2 index features
* content: plumb through compression.ID parameter to content.Manager.WriteContent()
* content: expose content.Manager.SupportsContentCompression
This allows object manager to decide whether to create compressed object
or let the content manager do it.
* object: if compression is requested and the repo supports it, pass compression ID to the content manager
* cli: show compression status in 'repository status'
* cli: output compression information in 'content list' and 'content stats'
* content: compression and decompression support
* content: unit tests for compression
* object: compression tests
* testing: added integration tests against v2 index
* testing: run all e2e tests with and without content-level compression
* htmlui: added UI for specifying index format on creation
* cli: additional tests for 'content ls' and 'content stats'
* applied pr suggestions
* object: refactored Open() and VerifyObject() to be stateless
(no code movement yet to facilitate review)
* mechanical: moved function more appropriate files
* object: remove object manager tracing which was unused
* linter: upgraded to 1.33, disabled some linters
* lint: fixed 'errorlint' errors
This ensures that all error comparisons use errors.Is() or errors.As().
We will be wrapping more errors going forward so it's important that
error checks are not strict everywhere.
Verified that there are no exceptions for errorlint linter which
guarantees that.
* lint: fixed or suppressed wrapcheck errors
* lint: nolintlint and misc cleanups
Co-authored-by: Julio López <julio+gh@kasten.io>
* object: added AsyncWrites to ObjectWriter, which improves performance of uploading of a single file
Fixes#351
Co-Authored-By: Julio López <julio+gh@kasten.io>
* performance: added buf.Pool which can be used to manage ephemeral buffers for encryption and compression
* repo: switched object writer to buf.Pool
* content: switched encryption to use buf.Pool
* object: switched compression to use buf.Pool
* testing: added missing content manager Close()
- added pooled splitters and ability to reset them without having to recreate
- added support for caller-provided compressor output to be able to pool it
- added pooling of compressor instances, since those are costly
Also introduced strongly typed content.ID and manifest.ID (instead of string)
This aligns identifiers across all layers of repository:
blob.ID
content.ID
object.ID
manifest.ID
This updates the terminology everywhere - blocks become blobs and
`storage.Storage` becomes `blob.Storage`.
Also introduced blob.ID which is a specialized string type, that's
different from CABS block ID.
Also renamed CLI subcommands from `kopia storage` to `kopia blob`.
While at it introduced `block.ErrBlockNotFound` and
`object.ErrObjectNotFound` that do not leak from lower layers.
The splitter in question was depending on
github.com/silvasur/buzhash which is not licensed according to FOSSA bot
Switched to new faster implementation of buzhash, which is
unfortunately incompatible and will split the objects in different
places.
This change is be semi-breaking - old repositories can be read, but
when uploading large objects they will be re-uploaded where previously
they would be de-duped.
Also added 'benchmark splitters' subcommand and moved 'block cryptobenchmark'
subcommand to 'benchmark crypto'.