Refactor `--profile-*` flags:
- Multiple profile types can be enabled at once, before only
a single type profiling could be done during a process execution.
- The new `--profiles-store-on-exit` enables all available profile
types, except for CPU profiling which needs to be explicitly enabled.
- Profiling parameters can now be set via new flags. This allows setting
the profile parameters for the pprof endpoint, as well as when saving
profiles to files on exit.
- Group profiling flags with other observability flags
- Adds a `--diagnostics-output-directory` flag that unifies and
supersedes the `--profile-dir` and `--metrics-directory` flags
Enhancements and behavior changes:
- Profile flags now have effect for all kopia commands, including
`server start`. Before these flags did not have any effect
in a few commands.
- Multiple profile types can be enabled at once, before only
a single type profiling could be done during a process execution.
- The new `--profiles-store-on-exit` enables all available profile
types, except for CPU profiling which needs to be explicitly enabled.
- Profiling parameters can now be set via new flags. This allows setting
the profile parameters for the pprof endpoint, as well as when saving
profiles to files on exit.
The following flags have been removed:
- `--profile-dir`: superseded by the `--diagnostics-output-directory` flag
- `--profile-blocking`: the `--profile-store-on-exit` flag enables blocking
profiling. Use `--profile-blocking-rate=0` to explicitly disable it.
- `--profile-memory`: the `--profile-store-on-exit` flag enables memory
profiling. Use `--profile-memory-rate=0` to explicitly disable it.
- `--profile-mutex`: the `--profile-store-on-exit` flag enables mutex
profiling. Use `--profile-mutex-fraction=0` to explicitly disable it.
Add CLI test for profile flags.
- nit: rename function to repositoryAction.
It always calls the action with a repository
- move allocator stats functionality to observability
- rename observability functions to start/stop. They
start and stop more than just the metrics services.
- rename field to c.enablePProfEndpoint for clarity.
- add observability run function to make it explicit
where start and stop are called.
Ensure auto-maintenance errors are propagated.
This enables sending notifications for failed "auto-maintenances".
Preserve action callback error when closing the repository fails.
This is a breaking change to users who might be using Kopia as a library.
### Log Format
```json
{"t":"<timestamp-rfc-3389-microseconds>", "span:T1":"V1", "span:T2":"V2", "n":"<source>", "m":"<message>", /*parameters*/}
```
Where each record is associated with one or more spans that describe its scope:
* `"span:client": "<hash-of-username@hostname>"`
* `"span:repo": "<random>"` - random identifier of a repository connection (from `repo.Open`)
* `"span:maintenance": "<random>"` - random identifier of a maintenance session
* `"span:upload": "<hash-of-username@host:/path>"` - uniquely identifies upload session of a given directory
* `"span:checkpoint": "<random>"` - encapsulates each checkpoint operation during Upload
* `"span:server-session": "<random>"` -single client connection to the server
* `"span:flush": "<random>"` - encapsulates each Flush session
* `"span:maintenance": "<random>"` - encapsulates each maintenance operation
* `"span:loadIndex" : "<random>"` - encapsulates index loading operation
* `"span:emr" : "<random>"` - encapsulates epoch manager refresh
* `"span:writePack": "<pack-blob-ID>"` - encapsulates pack blob preparation and writing
(plus additional minor spans for various phases of the maintenance).
Notable points:
- Used internal zero allocation JSON writer for reduced memory usage.
- renamed `--disable-internal-log` to `--disable-repository-log` (controls saving blobs to repository)
- added `--disable-content-log` (controls writing of `content-log` files)
- all storage operations are also logged in a structural way and associated with the corresponding spans.
- all content IDs are logged in a truncated format (since first N bytes that are usually enough to be unique) to improve compressibility of logs (blob IDs are frequently repeated but content IDs usually appear just once).
This format should make it possible to recreate the journey of any single content throughout pack blobs, indexes and compaction events.
Removes the following flags:
--caching: no-op
--list-caching: no-op
--enable-jaeger-collector: errors out when specified.
Also removes no-longer-used `deprecatedFlag` function.
Remove `repositoryAccessMode.mustBeConnected`
It is always true.
Rename `repositoryAccessMode.disableMaintenance`
to `allowMaintenance`.
This explicitly conveys when maintenance is allowed
to run. It is related to accessing a repository in
'read-only' mode.
* feat(cli): send error notifications and snapshot reports
Notifications will be sent to all configured notification profiles
according to their severity levels.
The following events will trigger notifications:
- Snapshot is created (CLI only, severity >= report)
- Server Maintenance error occurs (CLI, server and UI, severity >= error)
- Any other CLI error occurs (CLI only, severity >= error).
A flag `--no-error-notifications` can be used to disable error notifications.
* added template tests
* improved time formatting in templates
* plumb through notifytemplate.Options
* more testing for formatting options
* fixed default date format to RFC1123
Followups to #3655
* wrap fs.Reader
* nit: remove unnecessary intermediate variable
* nit: rename local variable
* cleanup: move restore.Progress interface to cli pkg
* move cliRestoreProgress to a separate file
* refactor(general): replace switch with if/else for clarity
Removes a tautology for `err == nil`, which was guaranteed
to be true in the second case statement for the switch.
Replacing the switch statement with and if/else block is clearer.
* initialize restoreProgress in restore command
* fix: use error.Wrapf with format string and args
Simplify SetCounters signature:
Pass arguments in a `restore.Stats` struct.
`SetCounters(s restore.Stats)`
Simplifies call sites and implementation.
In this case it makes sense to pass all the values
using the restore.Stats struct as it simplifies
the calls.
However, this pattern should be avoided in general
as it essentially makes all the arguments "optional".
This makes it easy to miss setting a value and simply
passing 0 (the default value), thus it becomes error
prone.
In this particular case, the struct is being passed
through verbatim, thus eliminating the risk of
missing a value, at least in the current state of
the code.
* refactor(test): allow signaling sub-process from testenv.CLIExeRunner
* test(cli): add test for handling SIGTERM
* feat(general): catch and process SIGTERM for termination
* refactor(cli): rename function cli.App.onTerminate
Renames function from onCtrlC to a more generic onTerminate
* chore(ci): upgraded linter to 1.53.3
This flagged a bunch of unused parameters, so the PR is larger than
usual, but 99% mechanical.
* separate lint CI task
* run Lint in separate CI
Lack of generics support is blocking various dependency upgrades,
so this unblocks that.
Temporarily disabled `checklocks` linter until it is fixed upstream.
* feat(repository): added `required features` to the repository
This is intended for future compatibility to be able to reliably
stop old kopia client from being able to open a repository when
the old code does not understand new `required feature`.
Required features are checked on startup and periodically using the
same method as upgrade lock, where they will return errors during blob
operations.
* pr feedback
* kopia format upgrade lock
* Update cli/command_repository_set_parameters_test.go
Co-authored-by: Ali Dowair <adowair@umich.edu>
* Update cli/command_repository_upgrade.go
Co-authored-by: Ali Dowair <adowair@umich.edu>
* Update cli/command_repository_upgrade.go
Co-authored-by: Ali Dowair <adowair@umich.edu>
* pr feedback
* pr feedback
* add a min drain time check
* env var for io-drain-timeout
* fix: add more doctext around upgrade phases
* build: wrap with EnvName
* add experimental warning
* protect upgrade cli behind env varible
* fix conflicts after relocating the upgrade lock
* generalize the command args
* drop certain features as per feedback
* sub-divide the upgrade command into begin and rollback
* Update cli/command_repository_upgrade.go
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* Update cli/command_repository_upgrade.go
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* missing return
* rename force flag to allow-unsafe-upgrade
Co-authored-by: Shikhar Mall <shikhar@kasten.io>
Co-authored-by: Ali Dowair <adowair@umich.edu>
Co-authored-by: Shikhar Mall <small@kopia.io>
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* feat(infra): improved support for in-process testing
* support for killing of a running server using simulated Ctrl-C
* support for overriding os.Stdin
* migrated many tests from the exe runner to in-process runner
* added required indirection when defining Envar() so we can later override it in tests
* refactored CLI runners by moving environment overrides to CLITestEnv
New flag `--enable-jaeger-collector` and the corresponding
`KOPIA_ENABLE_JAEGER_COLLECTOR` environment variable enables Jaeger
exporter, which by default sends OTEL traces to Jaeger collector on
http://localhost:14268/api/traces
To change this, use environment variables:
* `OTEL_EXPORTER_JAEGER_ENDPOINT`
* `OTEL_EXPORTER_JAEGER_USER`
* `OTEL_EXPORTER_JAEGER_PASSWORD`
When tracing is disabled, the impact on performance is negligible.
To see this in action:
1. Download latest Jaeger all-in-one from https://www.jaegertracing.io/download/
2. Run `jaeger-all-in-one` binary without any parameters.
3. Run `kopia --enable-jaeger-collector snapshot create ...`
4. Go to http://localhost:16686/search and search for traces
When enabled, metrics are pushed to the provided Prometheus Push
Gateway at the start and end of each command and periodically every
few seconds.
```
--metrics-push-addr=http://address:port
--metrics-push-interval=5s
--metrics-push-job=kopia
--metrics-push-grouping=a:b --metrics-push-grouping=c:d
--metrics-push-username=user
--metrics-push-password=pass
```
* refactor cli tests to allow the use of in-memory mock
* use in-memory repo for set-parameters cli tests
* move inmemory storage provider into test package
Co-authored-by: Shikhar Mall <shikhar@kasten.io>
This allows KopiaUI server to start when the repository directory
is not mounted or otherwise unavailable. Connection attempts will
be retried indefinitely and user will see new `Initializing` page.
This also exposes `Open` and `Connect` as tasks allowing the user to see
logs directly in the UI and cancel the operation.
* fix(security): prevent cross-site request forgery in the UI website
This fixes a [cross-site request forgery (CSRF)](https://en.wikipedia.org/wiki/Cross-site_request_forgery)
vulnerability in self-hosted UI for Kopia server.
The vulnerability allows potential attacker to make unauthorized API
calls against a running Kopia server. It requires an attacker to trick
the user into visiting a malicious website while also logged into a
Kopia website.
The vulnerability only affected self-hosted Kopia servers with UI. The
following configurations were not vulnerable:
* Kopia Repository Server without UI
* KopiaUI (desktop app)
* command-line usage of `kopia`
All users are strongly recommended to upgrade at the earliest
convenience.
* pr feedback
* logging: added log rotation and improved predictability of log sweep
With this change logs will be rotated every 50 MB, which prevents
accumulation of giant files while server is running.
This change will also guarantee that log sweep completes at least once
before each invocation of Kopia finishes. Previously it was a goroutine
that was not monitored for completion.
Flags can be used to override default behaviors:
* `--max-log-file-segment-size`
* `--no-wait-for-log-sweep` - disables waiting for full log sweep
Fixes#1561
* logging: added --log-dir-max-total-size-mb flag
This limits the total size of all logs in a directory to 1 GB.
* blob: changed default shards from {3,3} to {1,3}
Turns out for very large repository around 100TB (5M blobs),
we end up creating max ~16M directories which is way too much
and slows down listing. Currently each leaf directory only has a handful
of files.
Simple sharding of {3} should work much better and will end up creating
directories with meaningful shard sizes - 12 K files per directory
should not be too slow and will reduce the overhead of listing by
4096 times.
The change is done in a backwards-compatible way and will respect
custom sharding (.shards) file written by previous 0.9 builds
as well as older repositories that don't have the .shards file (which
we assume to be {3,3}).
* fixed compat tests
* fixed new gocritic violations
* fixed new 'contextcheck' violations
* fixed 'gosec' warnings
* suppressed ireturn and varnamelen linters
* fixed tenv violations, enabled building robustness tests on arm64
* fixed remaining linux failures
* makefile: fixed 'lint-all' target when running on arm64
* linter: increase deadline
* disable nilnil linter - to be enabled in separate PR
* Support setting AWS S3 storage class for all types of blobs
* Read .storageconfig file
* Improve loading logic
* Hide .storageconfig from ListBlobs()