* Configure compressor for k and x prefixed content
Adds metadata compression setting to policy
Add support to configure compressor for k and x prefixed content
Set zstd-fastest as the default compressor for metadata in the policy
Adds support to set and show metadata compression to kopia policy commands
Adds metadata compression config to dir writer
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
* Pass concatenate options with ConcatenateOptions struct
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
* Move content compression handling to caller
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
* Move handling manifests to manifest pkg
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
* Correct const in server_test
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
* Remove unnecessary whitespace
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
* Disable metadata compression for < V2 format
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
---------
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
* fix(repository): fixed handling of content.Info
Previously content.Info was an interface which was implemented by:
* index.InfoStruct
* index.indexEntryInfoV1
* index.indexEntryInfoV2
The last 2 implementations were relying on memory-mapped files
which in rare cases could be closed while Kopia was still processing
them leading to #2599.
This changes fixes the bug and strictly separates content.Info (which
is now always a struct) from the other two (which were renamed as
index.InfoReader and only used inside repo/content/...).
In addition to being safer, this _should_ reduce memory allocations.
* reduce the size of content.Info with proper alignment.
* pr feedback
* renamed index.InfoStruct to index.Info
Lint fixes in preparation for moving to Go 1.20
Remove deprecated calls to `rand.Seed`
In Go 1.20 the default generator is seeded randomly at program startup,
which is the desired behavior for these tests.
Remove uses of deprecated rand.Read: replace with calls to rand.Uint64()
Remove deprecated uses of rand.Read in content manager tests and
S3 versioned tests.
Adds a concurrency-safe helpers to provide functionality similar to that
provided by `rand.Read(b []byte) (int, error)`
This may be a breaking change for users who rely on particular kopia metrics (unlikely):
- introduced blob-level metrics:
* `kopia_blob_download_full_blob_bytes_total`
* `kopia_blob_download_partial_blob_bytes_total`
* `kopia_blob_upload_bytes_total`
* `kopia_blob_storage_latency_ms` - per-method latency distribution
* `kopia_blob_errors_total` - per-method error counter
- updated cache metrics to indicate particular cache
* `kopia_cache_hit_bytes_total{cache="CACHE_TYPE"}`
* `kopia_cache_hit_total{cache="CACHE_TYPE"}`
* `kopia_cache_malformed_total{cache="CACHE_TYPE"}`
* `kopia_cache_miss_total{cache="CACHE_TYPE"}`
* `kopia_cache_miss_errors_total{cache="CACHE_TYPE"}`
* `kopia_cache_miss_bytes_total{cache="CACHE_TYPE"}`
* `kopia_cache_store_errors_total{cache="CACHE_TYPE"}`
where `CACHE_TYPE` is one of `contents`, `metadata` or `index-blobs`
- reorganized and unified content-level metrics:
* `kopia_content_write_bytes_total`
* `kopia_content_write_duration_nanos_total`
* `kopia_content_compression_attempted_bytes_total`
* `kopia_content_compression_attempted_duration_nanos_total`
* `kopia_content_compression_savings_bytes_total`
* `kopia_content_compressible_bytes_total`
* `kopia_content_non_compressible_bytes_total`
* `kopia_content_after_compression_bytes_total`
* `kopia_content_decompressed_bytes_total`
* `kopia_content_decompressed_duration_nanos_total`
* `kopia_content_encrypted_bytes_total`
* `kopia_content_encrypted_duration_nanos_total`
* `kopia_content_hashed_bytes_total`
* `kopia_content_hashed_duration_nanos_total`
* `kopia_content_deduplicated_bytes_total`
* `kopia_content_read_bytes_total`
* `kopia_content_read_duration_nanos_total`
* `kopia_content_decrypted_bytes_total`
* `kopia_content_decrypted_duration_nanos_total`
* `kopia_content_uploaded_bytes_total`
Also introduced `internal/metrics` framework which constructs Prometheus metrics in a uniform way and will allow us to include some of these metrics in telemetry report in future PRs.
* refactor(repository): moved format blob management to separate package
This is completely mechanical, no behavior changes, only:
- moved types and functions to a new package
- adjusted visibility where needed
- added missing godoc
- renamed some identifiers to align with current usage
- mechanically converted some top-level functions into member functions
- fixed some mis-named variables
* refactor(repository): moved content.FormatingOptions to format.ContentFormat
* refactor(repository): ensure we always parse content.ID and object.ID
This changes the types to be incompatible with string to prevent direct
conversion to and from string.
This has the additional benefit of reducing number of memory allocations
and bytes for all IDs.
content.ID went from 2 allocations to 1:
typical case 32 characters + 16 bytes per-string overhead
worst-case 65 characters + 16 bytes per-string overhead
now: 34 bytes
object.ID went from 2 allocations to 1:
typical case 32 characters + 16 bytes per-string overhead
worst-case 65 characters + 16 bytes per-string overhead
now: 36 bytes
* move index.{ID,IDRange} methods to separate files
* replaced index.IDFromHash with content.IDFromHash externally
* minor tweaks and additional tests
* Update repo/content/index/id_test.go
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* Update repo/content/index/id_test.go
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* pr feedback
* post-merge fixes
* pr feedback
* pr feedback
* fixed subtle regression in sortedContents()
This was actually not producing invalid results because of how base36
works, just not sorting as efficiently as it could.
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
From https://github.com/google/gvisor/tree/master/tools/checklocks
This will perform static verification that we're using
`sync.Mutex`, `sync.RWMutex` and `atomic` correctly to guard access
to certain fields.
This was mostly just a matter of adding annotations to indicate which
fields are guarded by which mutex.
In a handful of places the code had to be refactored to allow static
analyzer to do its job better or to not be confused by some
constructs.
In one place this actually uncovered a bug where a function was not
releasing a lock properly in an error case.
The check is part of `make lint` but can also be invoked by
`make check-locks`.
* refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* chore: remove //nolint:gosec for os.ReadFile
At the time of this commit, the G304 rule of gosec does not include the
`os.ReadFile` function. We remove `//nolint:gosec` temporarily until
https://github.com/securego/gosec/pull/706 is merged.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* cli: added a flag to create repository with v2 index features
* content: plumb through compression.ID parameter to content.Manager.WriteContent()
* content: expose content.Manager.SupportsContentCompression
This allows object manager to decide whether to create compressed object
or let the content manager do it.
* object: if compression is requested and the repo supports it, pass compression ID to the content manager
* cli: show compression status in 'repository status'
* cli: output compression information in 'content list' and 'content stats'
* content: compression and decompression support
* content: unit tests for compression
* object: compression tests
* testing: added integration tests against v2 index
* testing: run all e2e tests with and without content-level compression
* htmlui: added UI for specifying index format on creation
* cli: additional tests for 'content ls' and 'content stats'
* applied pr suggestions
* testing: ensure tests are releasing all buffer pools to reduce memory usage, we had huge leaks
* object: reduced complexity and memory usage of TestEndToEndReadAndSeekWithCompression
* manifest: more test fixes
* trivial: update comment
Co-authored-by: Julio López <julio+gh@kasten.io>
* ci: refactored CI/CD logic & Makefile
- removed all travis CI emulation environment variables and replaced with:
CI_TAG=<empty>|tag
IS_PULL_REQUEST=false|true
- refactored all OS and architecture-specific decisions to use around standard GOOS/GOARCH values instead of uname/OS
- re-added self-hosted runner for ARMHF (3 replicas)
- added brand new self-hosted runner for ARM64 (3 replicas)
- disabled attempts to publish and sign on forks
- improved integration test log output to better see timings and sub-tests
- print longest tests (unit tests and integration) after each run
- verified that all configurations build successfully on a clone (jkowalski/kopia)
- run make setup in parallel
* testing: fixed tests on ARM and ARM64
- fixed ARM-specific alignment issue
- cleaned up test logging
- fixed huge params warning threshold because it was tripping on ARM.
- reduced test complexity to make them fit in 15 minutes
* object: refactored Open() and VerifyObject() to be stateless
(no code movement yet to facilitate review)
* mechanical: moved function more appropriate files
* object: remove object manager tracing which was unused
* linter: upgraded to 1.33, disabled some linters
* lint: fixed 'errorlint' errors
This ensures that all error comparisons use errors.Is() or errors.As().
We will be wrapping more errors going forward so it's important that
error checks are not strict everywhere.
Verified that there are no exceptions for errorlint linter which
guarantees that.
* lint: fixed or suppressed wrapcheck errors
* lint: nolintlint and misc cleanups
Co-authored-by: Julio López <julio+gh@kasten.io>
* object: fixed race condition between Result() and Checkpoint()
This would sometimes result in indirect objects having empty object IDs.
Fixes#648
* upload: ensure checkpoints never containt empty object IDs.
* testing: reduce armhf test weight
* fixed a number of cases where misaligned data was causing panics on armv7 (but not armv8)
* travis: enable arm64
* test: reduce compressed data sizes when running on arm
* arm: wait longer for snapshots
When prefix is not specified on ObjectWriter, we force
'x' content prefix on intermediate contents, so object IDs
will look like:
Ix{hash}
This ensures the index contents will be stored in `q` blobs,
making `snapshot gc` easier.
* object: added AsyncWrites to ObjectWriter, which improves performance of uploading of a single file
Fixes#351
Co-Authored-By: Julio López <julio+gh@kasten.io>
This is mostly mechanical and changes how loggers are instantiated.
Logger is now associated with a context, passed around all methods,
(most methods had ctx, but had to add it in a few missing places).
By default Kopia does not produce any logs, but it can be overridden,
either locally for a nested context, by calling
ctx = logging.WithLogger(ctx, newLoggerFunc)
To override logs globally, call logging.SetDefaultLogger(newLoggerFunc)
This refactoring allowed removing dependency from Kopia repo
and go-logging library (the CLI still uses it, though).
It is now also possible to have all test methods emit logs using
t.Logf() so that they show up in failure reports, which should make
debugging of test failures suck less.
this effectively defeated the purpose of compression, caused high
memory usage and other kinds of bad behavior.
refactored the code to prevent this issue by resetting the buffer
at the caller not callee.
fixed previous e2e test to catch the issue mentioned in #166,
verified it fails against master and passes with this change.
Also introduced strongly typed content.ID and manifest.ID (instead of string)
This aligns identifiers across all layers of repository:
blob.ID
content.ID
object.ID
manifest.ID