* remove omitempty from object id fields
* remove omitempty from storage stats fields
* use omitzero for other structs
* remove `omitempty` from `UpgradeLockIntent` fields
The `omitempty` JSON tag is ineffective for fields of
type `Time` and `Duration`.
Also, this struct is only used during the "upgrade"
protocol, and it is OK to explicitly serialize 0 values,
as it was done before.
Close file after mmap on Unix to reduce open file descriptors.
On Unix-like platforms, close the file descriptor immediately after a successful
mmap.Map of .sndx index cache files. This keeps the mapping valid until Unmap
(per POSIX semantics) and significantly reduces steady-state FD usage when many
indexes are open, helping avoid EMFILE ("too many open files").
- Split mmapOpenWithRetry into platform-specific implementations:
- committed_content_index_disk_cache_unix.go (!windows):
- Map RDONLY, close FD immediately.
- Return closer that only unmaps.
- committed_content_index_disk_cache_windows.go (windows):
- Keep FD open until Unmap.
- Return closer that unmaps and closes FD.
- Remove old mmapOpenWithRetry and mmap import from
repo/content/committed_content_index_disk_cache.go.
- Add Linux-only unit test verifying FD count does not grow proportionally:
repo/content/committed_content_index_fd_linux_test.go
- Creates N small indexes, opens them all, checks /proc/self/fd delta stays low.
Notes:
- Behavior unchanged on Windows due to OS semantics.
- Mapping failures close the FD to avoid leaks.
- Unlink semantics remain correct; mappings stay valid until Unmap.
* remove windows only retry logic under unix
Maintenance is critical for healthy of the repository.
On the other hand, Maintenance is complex, because
it runs multiple sub tasks each may generate different
results according to the maintenance policy.
The results may include deleting/combining/adding
data/metadata to the repository.
It is worthy to add more observability for these
tasks for below reasons:
It is helpful for troubleshooting. Any data change
to the repository is critical, the observability info
helps to understand what happened during the
maintenance and why that happened.
It is helpful for users to understand/predict the
repo's behavior. The repo data may be stored
in a public cloud for which costs are sensitive
to scale/duration of data stored. On the other
hand, repository has its own policy to manage
the data, so the data is not deleted until it is
safe enough according to the policy.
The observability info helps users to
understand how much data is in-use,
how much data is out of use and
when it is deleted
* fix(cli): improve progress output control in repository sync and documentation
- Update progress flag description from "progress bar" to "progress output" for clarity
- Document progress control features in Logging, Synchronization, and Command-Line reference
- Support --no-progress flag for cleaner automation and scripting usage
Prevent running "auto-maintenance" on snapshot create.
Ref:
- #4851
Context: the test fails because there are concurrent "endurance"
runners and each of these advances the clock, some of them
significantly (action{Small,Medium,Large}ClockJump), which
causes the clock skewness check to fail when
(auto-)maintenance runs.
The test maintenance action does not experience this issue
because it runs exclusively on its own, that is, other
actions have to wait until maintenance completes.
- nit: rename function to repositoryAction.
It always calls the action with a repository
- move allocator stats functionality to observability
- rename observability functions to start/stop. They
start and stop more than just the metrics services.
- rename field to c.enablePProfEndpoint for clarity.
- add observability run function to make it explicit
where start and stop are called.
Ensure auto-maintenance errors are propagated.
This enables sending notifications for failed "auto-maintenances".
Preserve action callback error when closing the repository fails.
This is a breaking change to users who might be using Kopia as a library.
### Log Format
```json
{"t":"<timestamp-rfc-3389-microseconds>", "span:T1":"V1", "span:T2":"V2", "n":"<source>", "m":"<message>", /*parameters*/}
```
Where each record is associated with one or more spans that describe its scope:
* `"span:client": "<hash-of-username@hostname>"`
* `"span:repo": "<random>"` - random identifier of a repository connection (from `repo.Open`)
* `"span:maintenance": "<random>"` - random identifier of a maintenance session
* `"span:upload": "<hash-of-username@host:/path>"` - uniquely identifies upload session of a given directory
* `"span:checkpoint": "<random>"` - encapsulates each checkpoint operation during Upload
* `"span:server-session": "<random>"` -single client connection to the server
* `"span:flush": "<random>"` - encapsulates each Flush session
* `"span:maintenance": "<random>"` - encapsulates each maintenance operation
* `"span:loadIndex" : "<random>"` - encapsulates index loading operation
* `"span:emr" : "<random>"` - encapsulates epoch manager refresh
* `"span:writePack": "<pack-blob-ID>"` - encapsulates pack blob preparation and writing
(plus additional minor spans for various phases of the maintenance).
Notable points:
- Used internal zero allocation JSON writer for reduced memory usage.
- renamed `--disable-internal-log` to `--disable-repository-log` (controls saving blobs to repository)
- added `--disable-content-log` (controls writing of `content-log` files)
- all storage operations are also logged in a structural way and associated with the corresponding spans.
- all content IDs are logged in a truncated format (since first N bytes that are usually enough to be unique) to improve compressibility of logs (blob IDs are frequently repeated but content IDs usually appear just once).
This format should make it possible to recreate the journey of any single content throughout pack blobs, indexes and compaction events.
* Remove unused return value from ListIndexBlobInfos
* Unexport index.Builder.buildStable
* Remove unnecessary OneUseBuilder.BuildStable
* Remove unnecessary `BuilderCreator` interface,
use a function type instead.
* Cleanup comment
Track and report errors separately according to the type of error:
- missing pack
- truncated pack
- unreadable content.
Add a counter stat for the contents that are
read and fully verified (via `GetContent`).
Count errors grouped by pack ID using a `CountersMap`.
This allows determining the number of referenced contents
that were missing in a particular pack.
Report the counter stats via structured logging.
---
Sample output:
$ kopia content verify --progress-interval=0.5s --download-percent=100
Listing blobs...
Listed 102 blobs.
Verifying contents...
Verified 1 contents, 0 errors, estimating...
Verified 279 contents, 0 errors, estimating...
Verified 512 of 624 contents (82.1%), 0 errors, remaining 0s, ETA 2025-09-17 23:03:38 PDT
Finished verifying contents
verifyCounters: {"verifiedContents":624,"totalErrorCount":0,"contentsInMissingPacks":0,\
"contentsInTruncatedPacks":0,"unreadableContents":0,"readContents":624,\
"missingPacks":0,"truncatedPacks":0,"corruptedPacks":0}
Move general functionality from the `content verify` CLI
command implementation to helpers in the content package.
The primary motivation is to allow reusing the content
verification functionality during maintenance.
A separate followup change also extends content
verification to include additional stats useful for
debugging repository corruptions.
Overview of the changes:
- Relocation of the content verification functionality
to the content package. The entry point is
content.WriteManager.VerifyContents.
This is primarily code movement with no functional changes.
- Addition of unit tests for the content verification functionality
by exercising content.WriteManager.VerifyContents.
- Minor functional change: changing the logging level from
Error to Warn for the "inner loop" error messages. This allows
filtering out these messages if needed, while still observing the
error message that is logged for the overall operation.
Use testify in content_formatter_test.go and content_manager_test.go
Refactor TestParallelWrites to run flusher in test driver Go routine,
instead of a separate Go routine. This addresses a linter error.