Commit Graph

81 Commits

Author SHA1 Message Date
Julio Lopez
8098f49c90 chore(ci): remove exclusion for unused ctx parameters (#4530)
Remove unused-parameter exclusion for `ctx` in revive linter.

---------

Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
Co-authored-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-04-26 23:11:36 -07:00
Matthieu MOREL
8a176255c0 fix(general): enable wsl for all go files (#4524)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-04-26 13:01:20 -07:00
Jarek Kowalski
eb1cf64c27 chore(ci): upgraded linter to 1.62.0 (#4250) 2024-11-16 07:16:50 -08:00
Prasad Ghangal
3bf947d746 feat(repository): Metadata compression config support for directory and indirect content (#4080)
* Configure compressor for k and x prefixed content

Adds metadata compression setting to policy
Add support to configure compressor for k and x prefixed content
Set zstd-fastest as the default compressor for metadata in the policy
Adds support to set and show metadata compression to kopia policy commands
Adds metadata compression config to dir writer

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Pass concatenate options with ConcatenateOptions struct

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Move content compression handling to caller

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Move handling manifests to manifest pkg

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Correct const in server_test

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Remove unnecessary whitespace

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Disable metadata compression for < V2 format

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

---------

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
2024-10-23 23:28:23 -07:00
Julio López
961a39039b refactor(general): use errors.New where appropriate (#4160)
Replaces 'errors.Errorf\("([^"]+)"\)' => 'errors.New("\1")'
2024-10-05 19:05:00 -07:00
Jarek Kowalski
e36fa78385 feat(snapshots): added support for per-directory splitter overrides (#3887)
This is useful when backing up directories that have giant files aligned
at MiB boundary, such as VM disk backups, etc.
2024-06-07 13:42:15 -07:00
Jarek Kowalski
fcb8197f3f chore(ci): upgraded linter to 1.59.0 (#3883) 2024-05-29 20:31:57 -07:00
Jarek Kowalski
09415e0c7d chore(ci): upgraded to go 1.22 (#3746)
Upgrades go to 1.22 and switches to new-style for loops

---------

Co-authored-by: Julio López <1953782+julio-lopez@users.noreply.github.com>
2024-04-08 09:52:47 -07:00
Jarek Kowalski
29cd545c33 chore(ci): upgrade linter to 1.56.2 (#3714) 2024-03-09 10:39:11 -08:00
Jarek Kowalski
d0fc1e03c4 fix(server): do not make blocking calls inside server status API (#3666)
also reduce global server lock scope
2024-02-21 12:34:16 -08:00
Jarek Kowalski
524ffaf4b8 refactor(repository): added context to potentially blocking repository methods (#3654)
Primarily for wiring a context.Context to a call to content.Manager.refresh,
which was using a detached context.
2024-02-20 14:48:23 -08:00
Jarek Kowalski
a8e4d50600 build(deps): upgraded linter to v1.55.2, fixed warnings (#3611)
* build(deps): upgraded linter to v1.55.2, fixed warnings

* removed unsafe hacks with better equivalents

* test fixes
2024-02-02 23:34:34 -08:00
Jarek Kowalski
7ee30b76bb fix(repository): fixed handling of content.Info (#3356)
* fix(repository): fixed handling of content.Info

Previously content.Info was an interface which was implemented by:

* index.InfoStruct
* index.indexEntryInfoV1
* index.indexEntryInfoV2

The last 2 implementations were relying on memory-mapped files
which in rare cases could be closed while Kopia was still processing
them leading to #2599.

This changes fixes the bug and strictly separates content.Info (which
is now always a struct) from the other two (which were renamed as
index.InfoReader and only used inside repo/content/...).

In addition to being safer, this _should_ reduce memory allocations.

* reduce the size of content.Info with proper alignment.

* pr feedback

* renamed index.InfoStruct to index.Info
2023-10-14 10:34:15 -07:00
Jarek Kowalski
cbc66f936d chore(ci): upgraded linter to 1.53.3 (#3079)
* chore(ci): upgraded linter to 1.53.3

This flagged a bunch of unused parameters, so the PR is larger than
usual, but 99% mechanical.

* separate lint CI task

* run Lint in separate CI
2023-06-18 13:26:01 -07:00
Julio Lopez
a99e38c247 fix(lint): remove uses of deprecated rand.Read (#2858)
Lint fixes in preparation for moving to Go 1.20

Remove deprecated calls to `rand.Seed`
In Go 1.20 the default generator is seeded randomly at program startup,
which is the desired behavior for these tests.

Remove uses of deprecated rand.Read: replace with calls to rand.Uint64()

Remove deprecated uses of rand.Read in content manager tests and
S3 versioned tests.
Adds a concurrency-safe helpers to provide functionality similar to that
provided by `rand.Read(b []byte) (int, error)`
2023-03-28 01:44:09 +00:00
Edward Betts
1e97574391 fix(general): correct spelling mistakes (#2684) 2023-01-21 07:37:15 -08:00
Jarek Kowalski
e57020fb70 test(repository): server testability refactoring (#2612)
- removed repo.OpenAPIServer() which was only needed for testability
- introduced servertesting package to replace it
2022-12-01 06:27:52 +00:00
Jarek Kowalski
78edd92692 refactor(repository): refactored Prometheus metrics (#2532)
This may be a breaking change for users who rely on particular kopia metrics (unlikely):

- introduced blob-level metrics:

* `kopia_blob_download_full_blob_bytes_total`
* `kopia_blob_download_partial_blob_bytes_total`
* `kopia_blob_upload_bytes_total`
* `kopia_blob_storage_latency_ms` - per-method latency distribution
* `kopia_blob_errors_total` - per-method error counter

- updated cache metrics to indicate particular cache

* `kopia_cache_hit_bytes_total{cache="CACHE_TYPE"}`
* `kopia_cache_hit_total{cache="CACHE_TYPE"}`
* `kopia_cache_malformed_total{cache="CACHE_TYPE"}`
* `kopia_cache_miss_total{cache="CACHE_TYPE"}`
* `kopia_cache_miss_errors_total{cache="CACHE_TYPE"}`
* `kopia_cache_miss_bytes_total{cache="CACHE_TYPE"}`
* `kopia_cache_store_errors_total{cache="CACHE_TYPE"}`

where `CACHE_TYPE` is one of `contents`, `metadata` or `index-blobs`

- reorganized and unified content-level metrics:

* `kopia_content_write_bytes_total`
* `kopia_content_write_duration_nanos_total`

* `kopia_content_compression_attempted_bytes_total`
* `kopia_content_compression_attempted_duration_nanos_total`
* `kopia_content_compression_savings_bytes_total`
* `kopia_content_compressible_bytes_total`
* `kopia_content_non_compressible_bytes_total`
* `kopia_content_after_compression_bytes_total`

* `kopia_content_decompressed_bytes_total`
* `kopia_content_decompressed_duration_nanos_total`

* `kopia_content_encrypted_bytes_total`
* `kopia_content_encrypted_duration_nanos_total`

* `kopia_content_hashed_bytes_total`
* `kopia_content_hashed_duration_nanos_total`

* `kopia_content_deduplicated_bytes_total`

* `kopia_content_read_bytes_total`
* `kopia_content_read_duration_nanos_total`

* `kopia_content_decrypted_bytes_total`
* `kopia_content_decrypted_duration_nanos_total`

* `kopia_content_uploaded_bytes_total`

Also introduced `internal/metrics` framework which constructs Prometheus metrics in a uniform way and will allow us to include some of these metrics in telemetry report in future PRs.
2022-11-10 05:30:06 +00:00
Jarek Kowalski
645e680a8f feat(general): reduce memory usage in maintenance, snapshot fix and verify (#2365) 2022-09-10 09:36:17 -07:00
Jarek Kowalski
51dcaa985d chore(ci): upgraded linter to 1.48.0 (#2294)
Mechanically fixed all issues, added `lint-fix` make target.
2022-08-09 06:07:54 +00:00
Jarek Kowalski
23299c3451 refactor(repository): ensure MutableParameters are never cached (#2284) 2022-08-06 18:11:32 -07:00
Jarek Kowalski
6160ee5668 refactor(repository): moved format blob management to separate package (#2245)
* refactor(repository): moved format blob management to separate package

This is completely mechanical, no behavior changes, only:

- moved types and functions to a new package
- adjusted visibility where needed
- added missing godoc
- renamed some identifiers to align with current usage
- mechanically converted some top-level functions into member functions
- fixed some mis-named variables

* refactor(repository): moved content.FormatingOptions to format.ContentFormat
2022-07-30 14:13:52 -07:00
Jarek Kowalski
68b8afd43f feat(snapshots): improved performance when uploading huge files (#2064)
* feat(snapshots): improved performance when uploading huge files

This is controlled by an upload policy which specifies the size
threshold above which indvidual files are uploaded in parts
and concatenated.

This allows multiple threads to run splitting, hashing, compression
and encryption in parallel, which was previously only possible across
multiple files, but not when a single file was being uploaded.

The default is 2GiB for now, so this feature only kicks in for very
larger files. In the future we may lower this.

Benchmark involved uploading a single 42.1 GB file which was a VM disk
snapshot of fresh Ubuntu installation (fresh EXT4 partition with lots
of zero bytes) to a brand-new filesystem repository on local SSD of
M1 Pro Macbook Pro 2021.

* before: 59-63s (~700 MB/s)
* after: 15-17s  (~2.6 GB/s)

* additional test to ensure files are really e2e readable
2022-06-24 07:38:07 +00:00
Jarek Kowalski
9bf9cac7fb refactor(repository): ensure we always parse content.ID and object.ID (#1960)
* refactor(repository): ensure we always parse content.ID and object.ID

This changes the types to be incompatible with string to prevent direct
conversion to and from string.

This has the additional benefit of reducing number of memory allocations
and bytes for all IDs.

content.ID went from 2 allocations to 1:
   typical case 32 characters + 16 bytes per-string overhead
   worst-case 65 characters + 16 bytes per-string overhead
   now: 34 bytes

object.ID went from 2 allocations to 1:
   typical case 32 characters + 16 bytes per-string overhead
   worst-case 65 characters + 16 bytes per-string overhead
   now: 36 bytes

* move index.{ID,IDRange} methods to separate files

* replaced index.IDFromHash with content.IDFromHash externally

* minor tweaks and additional tests

* Update repo/content/index/id_test.go

Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>

* Update repo/content/index/id_test.go

Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>

* pr feedback

* post-merge fixes

* pr feedback

* pr feedback

* fixed subtle regression in sortedContents()

This was actually not producing invalid results because of how base36
works, just not sorting as efficiently as it could.

Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
2022-05-25 14:15:56 +00:00
Jarek Kowalski
daa62de3e4 chore(ci): added checklocks static analyzer (#1838)
From https://github.com/google/gvisor/tree/master/tools/checklocks

This will perform static verification that we're using
`sync.Mutex`, `sync.RWMutex` and `atomic` correctly to guard access
to certain fields.

This was mostly just a matter of adding annotations to indicate which
fields are guarded by which mutex.

In a handful of places the code had to be refactored to allow static
analyzer to do its job better or to not be confused by some
constructs.

In one place this actually uncovered a bug where a function was not
releasing a lock properly in an error case.

The check is part of `make lint` but can also be invoked by
`make check-locks`.
2022-03-19 22:42:59 -07:00
Jarek Kowalski
69dc7ba969 feat(repository): added 'hint' to Prefetch methods. (#1825) 2022-03-12 23:16:39 -08:00
Jarek Kowalski
db7dcac33c feat(repo): exposed PrefetchContents on the repo.Repository() (#1824) 2022-03-12 14:36:54 -08:00
Jarek Kowalski
926e14aacb feat(repository): added PrefetchObjects() API (#1779)
* feat(repository): added precaching of data blobs

* feat(repository): added utilities for converting ID slices to strings

* feat(repository): added object.PrefetchBackingContents

* feat(repository): implemented Repository.PrefetchObjects

* feat(cli): added 'cache prefetch' subcommand

* feat(repository): prefetch in parallel

* added tests
2022-03-06 14:30:58 -08:00
Jarek Kowalski
e67f84e0ba chore(general): updated linter to 1.44.0 (#1681) 2022-01-25 21:21:13 -08:00
Jarek Kowalski
bbbef44d8a More coverage improvements (#1577)
* increased direct coverage for internal/cache

* object: code coverage improvements for object writer
2021-12-11 23:27:42 -08:00
Jarek Kowalski
8b760b66a8 logging: added memoization of Logger instances per context (#1369) 2021-10-09 05:02:18 -07:00
Eng Zer Jun
73e492c9db refactor: move from io/ioutil to io and os package (#1360)
* refactor: move from io/ioutil to io and os package

The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* chore: remove //nolint:gosec for os.ReadFile

At the time of this commit, the G304 rule of gosec does not include the
`os.ReadFile` function. We remove `//nolint:gosec` temporarily until
https://github.com/securego/gosec/pull/706 is merged.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-10-06 08:39:10 -07:00
Jarek Kowalski
792cc874dc repo: allow reusing of object writer buffers (#1315)
This reduces memory consumption and speeds up backups.

1. Backing up kopia repository (3.5 GB files:133102 dirs:20074):

before: 25s, 490 MB
after: 21s, 445 MB

2. Large files (14.8 GB, 76 files)

before: 30s, 597 MB
after: 28s, 495 MB

All tests repeated 5 times for clean local filesystem repo.
2021-09-25 14:54:31 -07:00
Jarek Kowalski
35d0f31c0d huge: replaced the use of allocated byte slices with populating gather.WriteBuffer in the repository (#1244)
This helps recycle buffers more efficiently during snapshots.
Also, improved memory tracking, enabled profiling flags and added pprof
by default.
2021-08-20 08:45:10 -07:00
Jarek Kowalski
40510c043d Support for content-level compression (#1076)
* cli: added a flag to create repository with v2 index features

* content: plumb through compression.ID parameter to content.Manager.WriteContent()

* content: expose content.Manager.SupportsContentCompression

This allows object manager to decide whether to create compressed object
or let the content manager do it.

* object: if compression is requested and the repo supports it, pass compression ID to the content manager

* cli: show compression status in 'repository status'

* cli: output compression information in 'content list' and 'content stats'

* content: compression and decompression support

* content: unit tests for compression

* object: compression tests

* testing: added integration tests against v2 index

* testing: run all e2e tests with and without content-level compression

* htmlui: added UI for specifying index format on creation

* cli: additional tests for 'content ls' and 'content stats'

* applied pr suggestions
2021-05-22 05:35:27 -07:00
Jarek Kowalski
30ca3e2e6c Upgraded linter to 1.40.1 (#1072)
* tools: upgraded linter to 1.40.1

* lint: fixed nolintlint vionlations

* lint: disabled tagliatele linter

* lint: fixed remaining warnings
2021-05-15 12:12:34 -07:00
Jarek Kowalski
df430371b9 Refactored content.Info to be an interface and switched index parsing to be lazy (#1008) 2021-04-27 05:53:52 -07:00
Jarek Kowalski
2062c07259 mechanical field renames (#988)
* content: mechanical rename content.Info.Length -> content.Info.PackedLength
* server: renamed grpc API ContentInfo.length->packed_length (non-breaking)
2021-04-16 22:42:32 -07:00
Jarek Kowalski
f4347886b8 logging: simplified log levels (#954)
Removed Warning, Notify and Fatal:

* `Warning` => `Error` or `Info`
* `Notify` => `Info`
* `Fatal` was never used.

Note that --log-level=warning is still supported for backwards
compatibility, but it is the same as --log-level=error.

Co-authored-by: Julio López <julio+gh@kasten.io>
2021-04-09 07:27:35 -07:00
Jarek Kowalski
7c108930ef testing: ensure tests are releasing all buffer pools to reduce memory usage, we had huge leaks (#895)
* testing: ensure tests are releasing all buffer pools to reduce memory usage, we had huge leaks

* object: reduced complexity and memory usage of TestEndToEndReadAndSeekWithCompression

* manifest: more test fixes

* trivial: update comment

Co-authored-by: Julio López <julio+gh@kasten.io>
2021-03-18 06:40:33 -07:00
Jarek Kowalski
e2b9a81ac3 Major CI/CD refactoring and re-added support for ARM/ARM64 runners (#849)
* ci: refactored CI/CD logic & Makefile

- removed all travis CI emulation environment variables and replaced with:

CI_TAG=<empty>|tag
IS_PULL_REQUEST=false|true

- refactored all OS and architecture-specific decisions to use around standard GOOS/GOARCH values instead of uname/OS
- re-added self-hosted runner for ARMHF (3 replicas)
- added brand new self-hosted runner for ARM64 (3 replicas)
- disabled attempts to publish and sign on forks
- improved integration test log output to better see timings and sub-tests
- print longest tests (unit tests and integration) after each run
- verified that all configurations build successfully on a clone (jkowalski/kopia)
- run make setup in parallel

* testing: fixed tests on ARM and ARM64

- fixed ARM-specific alignment issue
- cleaned up test logging
- fixed huge params warning threshold because it was tripping on ARM.
- reduced test complexity to make them fit in 15 minutes
2021-02-23 00:52:54 -08:00
Jarek Kowalski
1f3b8d4da4 upgrade linter to 1.35 (#786)
* lint: added test that enforces Makefile and GH action linter versions are in sync
* workaround for linter gomnd problem - https://github.com/golangci/golangci-lint/issues/1653
2021-01-16 18:21:16 -08:00
Jarek Kowalski
ed9db56b87 Cleaned up and refactored object manager (#782)
* object: refactored Open() and VerifyObject() to be stateless

(no code movement yet to facilitate review)

* mechanical: moved function more appropriate files

* object: remove object manager tracing which was unused
2021-01-13 00:40:23 -08:00
Jarek Kowalski
d52af7cebb fixed cases where nil was passed to errors.Wrap() causing nil to be returned (#747) 2020-12-24 20:35:29 -08:00
Jarek Kowalski
e03971fc59 Upgraded linter to v1.33.0 (#734)
* linter: upgraded to 1.33, disabled some linters

* lint: fixed 'errorlint' errors

This ensures that all error comparisons use errors.Is() or errors.As().
We will be wrapping more errors going forward so it's important that
error checks are not strict everywhere.

Verified that there are no exceptions for errorlint linter which
guarantees that.

* lint: fixed or suppressed wrapcheck errors

* lint: nolintlint and misc cleanups

Co-authored-by: Julio López <julio+gh@kasten.io>
2020-12-21 22:39:22 -08:00
Jarek Kowalski
66cebb79cb Fixed empty object IDs in checkpoints (#649)
* object: fixed race condition between Result() and Checkpoint()

This would sometimes result in indirect objects having empty object IDs.

Fixes #648

* upload: ensure checkpoints never containt empty object IDs.

* testing: reduce armhf test weight
2020-09-29 07:14:47 -07:00
Jarek Kowalski
f0b97b960b Fixed checkpointing to not restart the entire upload process (#594)
* object: added Checkpoint() method to object writer

* upload: refactored code structure to allow better checkpointing

* upload: removed Checkpoint() method from UploadProgress

* Update fs/entry.go

Co-authored-by: Julio López <julio+gh@kasten.io>
2020-09-12 22:36:22 -07:00
Jarek Kowalski
e22d22dba2 object: implemented fast concatenation of objects by merging their index entries (#607) 2020-09-11 20:12:01 -07:00
Jarek Kowalski
faf280616a Splitter throughput improvements (#606)
* object: refactored writer to detect split points before writing

This introduces new primitive that will be moved into splitters
themselves in subsequent commits. I'm doing this in small steps to
ensure we don't regress at any time.

* splitter: refactored TestSplitters test

This is use slow (byte-by-byte) and fast (nextSplitPoint) methods of
determining split points.

Note nextSplitPoint is not implemented by splitters yet, but this
verifies that the test is expecting the right thing.

* object: splitter refactoring - replaced ShouldSplit() with NextSplitPoint() everywhere, still not optimized

* splitter: added additional dimension to splitter_test

We split either in large chunks or one byte at a time to catch
the corner cases in the splitter implementation.

* splitter: optimized splitters using NextSplitPoint primitive

This improves splitter performance by about 40% (buzhash) and makes
it virtually free for FIXED splitter.
2020-09-11 19:45:48 -07:00
Julio López
aebf8616a2 object: loadSeekTable helper (#608) 2020-09-11 19:12:13 -07:00