My backup had a fatal error. The end of the log looked like this:
```
Created snapshot with root k5ab05dd5a8aaf9da8a6a822abd0afabb and ID 04caa6e10f4e2866a74492a4162ea943 in 2m44s
WARN Ignored 943 error(s) while snapshotting root@tower:/.
ERROR Found 1 fatal error(s) while snapshotting root@tower:/.
```
Note that "WARN" is yellow and "ERROR" is red.
Since I got a fatal error, I wanted to check what the fatal error was.
Because "ERROR" in the lines above is red, I expected the fatal error in
the kopia log to also be red, but it was yellow like the non-fatal
errors. This was unexpected to me.
Also note that I have lots of "! Ignored error when processing" in the
kopia log because I also backup Docker containers, so right now it is
not easy to find the fatal error among the non-fatal errors.
* fix(repository): fixed handling of content.Info
Previously content.Info was an interface which was implemented by:
* index.InfoStruct
* index.indexEntryInfoV1
* index.indexEntryInfoV2
The last 2 implementations were relying on memory-mapped files
which in rare cases could be closed while Kopia was still processing
them leading to #2599.
This changes fixes the bug and strictly separates content.Info (which
is now always a struct) from the other two (which were renamed as
index.InfoReader and only used inside repo/content/...).
In addition to being safer, this _should_ reduce memory allocations.
* reduce the size of content.Info with proper alignment.
* pr feedback
* renamed index.InfoStruct to index.Info
* Improve RunMissed algorithm to work better with Cron and to give more predictable results for time-of-day rules
* Add a RunMissed test for multiple times-of-day
* add variable to improve code-readability
* Fix test after rebase
* feat(server): reduce server refreshes of the repository
Previously each source would refresh itself from the repository
very frequently to determine the upcoming snapshot time. This change
refactors source manager so it does not own the repository connection
on its own but instead delegates all policy reads through the server.
Also introduces a new server scheduler that is responsible for
centrally managing the snapshot schedule and triggering snapshots
when they are due.
* Update cli/command_server_start.go
Co-authored-by: Shikhar Mall <mall.shikhar.in@gmail.com>
* Update internal/server/server.go
Co-authored-by: Shikhar Mall <mall.shikhar.in@gmail.com>
* Update internal/server/server_maintenance.go
Co-authored-by: Shikhar Mall <mall.shikhar.in@gmail.com>
* pr feedback
---------
Co-authored-by: Shikhar Mall <mall.shikhar.in@gmail.com>
Added improved providervalidation logic which tests for read-after-write
property between connections. The new test was failing before the change
and is now passing for Google Drive, OneDrive and DropBox.
* feat(repository): apply retention policies server-side
This allows append-only snapshots where the client can never delete
arbitrary manifests and policies are maintained on the server.
The client only needs permissions to create snapshots in a given, which
automatically gives them permission to invoke the server-side method
for their own snapshots only.
* Update cli/command_acl_add.go
Co-authored-by: Guillaume <Gui13@users.noreply.github.com>
* Update internal/server/api_manifest.go
Co-authored-by: Guillaume <Gui13@users.noreply.github.com>
* Update internal/server/api_manifest.go
Co-authored-by: Guillaume <Gui13@users.noreply.github.com>
* Update internal/server/grpc_session.go
Co-authored-by: Guillaume <Gui13@users.noreply.github.com>
---------
Co-authored-by: Guillaume <Gui13@users.noreply.github.com>
Previously some logs from a running server were only kept in memory
(including storage activity logs) which was confusing to many folks.
This changes the behavior so that logs are sent to their regular
(console/file) file locations in addition to the UI tasks.
Old behavior can be restored by adding `--no-persistent-logs` to
server.
Objective: to facilitate capturing metrics for relatively short-lived kopia
executions.
Motivation: Currently, kopia allows a. exposing metrics via a Prometheus
exporter; and b. pushing metrics to a Prometheus gateway. In certain
scenarios, such as short lived executions, it is not possible to reliably
scrape the metrics from the exporter endpoint. And, while the pusher
approach would work, it requires starting a separate push gateway.
This change allows saving the metrics to a local file before kopia exists
by specifying the metrics output directory. For example,
`kopia --metrics-directory /tmp/kopia-metrics/ snapshot create ....`
In this case, after kopia exists, the `/tmp/kopia-metrics/` directory will
contain a file with a named of the form:
`<date>-<time>-<command_subcommand>.prom`
Motivation: reduce the function complexity for the linter, so additional
functionality can be added to `startMetrics()`
No functional changes.
Changes and transformations:
- Break down observability.startMetrics() into: maybeStartListener().
maybeStartMetricsPusher() and maybeStartTraceExporter().
- Inline observabilityFlags.getExporter in maybeStartTraceExporter.
- Only create resource when exporter is non-nil.
- Simplify maybeStartListener: reduce indentation by returning early.
- Simplify maybeStartMetricsPusher: reduce indentation by returning early.
Adds a check for snapshot create when --all and a source path are
given simultaneously. Returns an error in this case since it would
otherwise create a duplicate snapshot for the specified source path.
---
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
Changes kopia's behavior to match the exit code that would
have been returned when the `--json` flag was not specified.
`kopia snapshot create my/path --json` terminates with a 0
status code in cases where
`kopia snapshot create my/path` terminates with a
non-zero exit code.
One such case is when there are permissions errors reading
files or directories to snapshot.
Adds end-to-end tests for snapshot create with '--json' flag
* feat(snapshots): added ability to use cron expressions to schedule snapshots
We use `github.com/hashicorp/cronexpr` to parse and evaluate expressions,
as documented in https://github.com/hashicorp/cronexpr#implementation
* upgrade ui
* pr feedback
* Implement ability to extend retention time on S3 buckets using Object Locks
* Move object-lock extension to maintenance.Params.
* Use a default function for unsupported extensions instead of duplicating code
* Fix potential lockup during object-lock extension
* Fix race condition. Add more code coverage
* rebase to V3
* Add checks to prevent user from setting Retention Period < Full Maintenance Interval
---------
Co-authored-by: Ashlie Martinez <ashmrtnz@alcion.ai>
* chore(ci): upgraded linter to 1.53.3
This flagged a bunch of unused parameters, so the PR is larger than
usual, but 99% mechanical.
* separate lint CI task
* run Lint in separate CI
* remove deprecated `snapshot gc` command
* run `maintenance` instead of `snapshot gc` in robustness
* use `maintenance` command instead of `gc` alias for clarity
* use `maintenance run` in `TestSnapshotDeleteRestore`
This commit changes the behavior of the command
`kopia repo upgrade begin...` to not fail (exit code 1) when the repository is already using the latest format version. Instead, a helpful message is output and the program exits with zero code. In effect the command becomes idempotent-successive upgrades would return the same exit code. Such an idempotent api is desirable, especially in cases where we build automation around format upgrades.
Before this change, an error code 1 is returned when upgrading a repository that is already up to date:
```
$ kopia repo status | grep "Format Version"
Format version: 3
$ kopia repo upgrade begin --upgrade-owner-id admin
[1] ERROR error setting the upgrade lock intent: repository is using version 3, and version 3 is the maximum
```
and after this change, a 0 code is returned:
```
$ kopia repo upgrade begin --upgrade-owner-id admin
[0] Repository format is already upto date.
```
* feat(server): improved server shutdown and integration tests
Added `--shutdown-grace-period` flag to `kopia server start` command
which can be used to specify how long the server will wait for active
connections to finish before forcibly shutting down.
This allowed removal of final out-of-process execution of
during integration tests and the need for `integration-tests` target
which was running the same tests as `tests` but in out-of-process mode.
We thus now have all the test coverage in-process without having to
build and launch `kopia` binary.
* fixed logging
* increase test timeout
* speed up and/or parallelize longest-running tests
* feat(cli): print upgrade owner in repository status
To help users understand the state of their repository better, this one
line change also prints out the upgrade owner's ID in the output of
`kopia repository status`.
* Upgrade `create --format-version` help message
To show that there is now a format version 3 that can be set.
* Return ReadCloser from StreamingFile
Allow better resource management by returning something that can be closed
when dealing with StreamingFiles.
* Close StreamingFile Reader during upload
* Use NopCloser on inputs that don't implement Close
Fixup callers of the StreamingFile API by wrapping regular Readers with
NopCloser calls where necessary.
Almost all were easy to replace, except ones exposed via JSON which
have been left as-is.
The linter has a cool behavior where it flags attempts to pass
`atomic.Int32` for example by value , which is always a mistake,
say as an argument to `fmt.Sprintf()`
This may be a breaking change for users who rely on particular kopia metrics (unlikely):
- introduced blob-level metrics:
* `kopia_blob_download_full_blob_bytes_total`
* `kopia_blob_download_partial_blob_bytes_total`
* `kopia_blob_upload_bytes_total`
* `kopia_blob_storage_latency_ms` - per-method latency distribution
* `kopia_blob_errors_total` - per-method error counter
- updated cache metrics to indicate particular cache
* `kopia_cache_hit_bytes_total{cache="CACHE_TYPE"}`
* `kopia_cache_hit_total{cache="CACHE_TYPE"}`
* `kopia_cache_malformed_total{cache="CACHE_TYPE"}`
* `kopia_cache_miss_total{cache="CACHE_TYPE"}`
* `kopia_cache_miss_errors_total{cache="CACHE_TYPE"}`
* `kopia_cache_miss_bytes_total{cache="CACHE_TYPE"}`
* `kopia_cache_store_errors_total{cache="CACHE_TYPE"}`
where `CACHE_TYPE` is one of `contents`, `metadata` or `index-blobs`
- reorganized and unified content-level metrics:
* `kopia_content_write_bytes_total`
* `kopia_content_write_duration_nanos_total`
* `kopia_content_compression_attempted_bytes_total`
* `kopia_content_compression_attempted_duration_nanos_total`
* `kopia_content_compression_savings_bytes_total`
* `kopia_content_compressible_bytes_total`
* `kopia_content_non_compressible_bytes_total`
* `kopia_content_after_compression_bytes_total`
* `kopia_content_decompressed_bytes_total`
* `kopia_content_decompressed_duration_nanos_total`
* `kopia_content_encrypted_bytes_total`
* `kopia_content_encrypted_duration_nanos_total`
* `kopia_content_hashed_bytes_total`
* `kopia_content_hashed_duration_nanos_total`
* `kopia_content_deduplicated_bytes_total`
* `kopia_content_read_bytes_total`
* `kopia_content_read_duration_nanos_total`
* `kopia_content_decrypted_bytes_total`
* `kopia_content_decrypted_duration_nanos_total`
* `kopia_content_uploaded_bytes_total`
Also introduced `internal/metrics` framework which constructs Prometheus metrics in a uniform way and will allow us to include some of these metrics in telemetry report in future PRs.