Refactor `--profile-*` flags:
- Multiple profile types can be enabled at once, before only
a single type profiling could be done during a process execution.
- The new `--profiles-store-on-exit` enables all available profile
types, except for CPU profiling which needs to be explicitly enabled.
- Profiling parameters can now be set via new flags. This allows setting
the profile parameters for the pprof endpoint, as well as when saving
profiles to files on exit.
- Group profiling flags with other observability flags
- Adds a `--diagnostics-output-directory` flag that unifies and
supersedes the `--profile-dir` and `--metrics-directory` flags
Enhancements and behavior changes:
- Profile flags now have effect for all kopia commands, including
`server start`. Before these flags did not have any effect
in a few commands.
- Multiple profile types can be enabled at once, before only
a single type profiling could be done during a process execution.
- The new `--profiles-store-on-exit` enables all available profile
types, except for CPU profiling which needs to be explicitly enabled.
- Profiling parameters can now be set via new flags. This allows setting
the profile parameters for the pprof endpoint, as well as when saving
profiles to files on exit.
The following flags have been removed:
- `--profile-dir`: superseded by the `--diagnostics-output-directory` flag
- `--profile-blocking`: the `--profile-store-on-exit` flag enables blocking
profiling. Use `--profile-blocking-rate=0` to explicitly disable it.
- `--profile-memory`: the `--profile-store-on-exit` flag enables memory
profiling. Use `--profile-memory-rate=0` to explicitly disable it.
- `--profile-mutex`: the `--profile-store-on-exit` flag enables mutex
profiling. Use `--profile-mutex-fraction=0` to explicitly disable it.
Add CLI test for profile flags.
Objective: to facilitate capturing metrics for relatively short-lived kopia
executions.
Motivation: Currently, kopia allows a. exposing metrics via a Prometheus
exporter; and b. pushing metrics to a Prometheus gateway. In certain
scenarios, such as short lived executions, it is not possible to reliably
scrape the metrics from the exporter endpoint. And, while the pusher
approach would work, it requires starting a separate push gateway.
This change allows saving the metrics to a local file before kopia exists
by specifying the metrics output directory. For example,
`kopia --metrics-directory /tmp/kopia-metrics/ snapshot create ....`
In this case, after kopia exists, the `/tmp/kopia-metrics/` directory will
contain a file with a named of the form:
`<date>-<time>-<command_subcommand>.prom`
This may be a breaking change for users who rely on particular kopia metrics (unlikely):
- introduced blob-level metrics:
* `kopia_blob_download_full_blob_bytes_total`
* `kopia_blob_download_partial_blob_bytes_total`
* `kopia_blob_upload_bytes_total`
* `kopia_blob_storage_latency_ms` - per-method latency distribution
* `kopia_blob_errors_total` - per-method error counter
- updated cache metrics to indicate particular cache
* `kopia_cache_hit_bytes_total{cache="CACHE_TYPE"}`
* `kopia_cache_hit_total{cache="CACHE_TYPE"}`
* `kopia_cache_malformed_total{cache="CACHE_TYPE"}`
* `kopia_cache_miss_total{cache="CACHE_TYPE"}`
* `kopia_cache_miss_errors_total{cache="CACHE_TYPE"}`
* `kopia_cache_miss_bytes_total{cache="CACHE_TYPE"}`
* `kopia_cache_store_errors_total{cache="CACHE_TYPE"}`
where `CACHE_TYPE` is one of `contents`, `metadata` or `index-blobs`
- reorganized and unified content-level metrics:
* `kopia_content_write_bytes_total`
* `kopia_content_write_duration_nanos_total`
* `kopia_content_compression_attempted_bytes_total`
* `kopia_content_compression_attempted_duration_nanos_total`
* `kopia_content_compression_savings_bytes_total`
* `kopia_content_compressible_bytes_total`
* `kopia_content_non_compressible_bytes_total`
* `kopia_content_after_compression_bytes_total`
* `kopia_content_decompressed_bytes_total`
* `kopia_content_decompressed_duration_nanos_total`
* `kopia_content_encrypted_bytes_total`
* `kopia_content_encrypted_duration_nanos_total`
* `kopia_content_hashed_bytes_total`
* `kopia_content_hashed_duration_nanos_total`
* `kopia_content_deduplicated_bytes_total`
* `kopia_content_read_bytes_total`
* `kopia_content_read_duration_nanos_total`
* `kopia_content_decrypted_bytes_total`
* `kopia_content_decrypted_duration_nanos_total`
* `kopia_content_uploaded_bytes_total`
Also introduced `internal/metrics` framework which constructs Prometheus metrics in a uniform way and will allow us to include some of these metrics in telemetry report in future PRs.
New flag `--enable-jaeger-collector` and the corresponding
`KOPIA_ENABLE_JAEGER_COLLECTOR` environment variable enables Jaeger
exporter, which by default sends OTEL traces to Jaeger collector on
http://localhost:14268/api/traces
To change this, use environment variables:
* `OTEL_EXPORTER_JAEGER_ENDPOINT`
* `OTEL_EXPORTER_JAEGER_USER`
* `OTEL_EXPORTER_JAEGER_PASSWORD`
When tracing is disabled, the impact on performance is negligible.
To see this in action:
1. Download latest Jaeger all-in-one from https://www.jaegertracing.io/download/
2. Run `jaeger-all-in-one` binary without any parameters.
3. Run `kopia --enable-jaeger-collector snapshot create ...`
4. Go to http://localhost:16686/search and search for traces
When enabled, metrics are pushed to the provided Prometheus Push
Gateway at the start and end of each command and periodically every
few seconds.
```
--metrics-push-addr=http://address:port
--metrics-push-interval=5s
--metrics-push-job=kopia
--metrics-push-grouping=a:b --metrics-push-grouping=c:d
--metrics-push-username=user
--metrics-push-password=pass
```