Cleanup nits:
- get error handling policy upfront and improve readability in uploader
- update error message
- update field documentation and update flag description
- remove unused function
- const `isWindows` and remove redundant condition check
- add `getEnvVarBool` helper
- refactor common helper for mockfs.AddError* functions, and
add mockfs.AddErrorEntry<Type> wrappers for clarity.
- removed list of skipped tests from gotestsum summary
Revert "feat(snapshots): localfs support for passing options (#5044)"
commit c8c4615595.
Fix: return `ErrorEntry` for permission denied instead of aborting
When iterating a directory, if `lstat` fails with permission denied on
an entry, return an `ErrorEntry` instead of an error that stops the
entire directory iteration.
Previously, a single inaccessible entry, such as, a FUSE/sshfs mount
owned by another user, would cause the entire containing directory to
fail and be omitted from the snapshot, resulting in data loss.
Now, the inaccessible entry is returned as an ErrorEntry which is
handled according to the configured error handling policy, allowing
iteration to continue and the rest of the directory to be backed up.
- Fixes: #5045
Differentiate entry type when ignoring failed entries
Fix tests for new behavior, including handling timing-dependent
behavior when snapshots --fail-fast
---------
Co-authored-by: Geoffrey D. Bennett <g@netcraft.com.au>
- Move MaybePrefixLongFilenameOnWindows to ospath package and rename
it to SafeLongFilename, along with corresponding test.
- Elide the function implementation at build time on non-Windows
platforms.
- Update documentation and comments for clarity.
- Rename package-local helper function.
The `omitempty` JSON tag is ineffective on fields of struct type, among others.
The fields with tags changed from `omitempty` to `omitzero` can be classified into 2 categories:
- fields for which the intent was to avoid serializing the value when it was zero or empty (such as structs with the zero value for the struct);
- policy-definition fields that have the `omitempty` to match the corresponding policy fields.
The fields are changed such that
- the fields are not serialized when they have the zero (or empty) value; and
- consistency is maintained between policy fields and the corresponding policy definition fields.
---
In the context of policy definitions:
**Fields of type `struct`**:
In some cases, the fields in the policy definition and values are of type struct, such as is the case for the `policy.Policy` and `policy.Definition` structs.
In these cases, the `omitzero` JSON tag avoids marshaling empty fields, making the serialized representation more compact and less noisy, while preserving the same behavior and thus semantic when unmarshaling omitted fields.
**Fields of pointer types**:
The `omitempty` and `omitzero` have _practically_ the same effect on fields of pointer types:
- the field is omitted when it is null
- the field is included when it is not null, even if the value that it points to is "the zero value" for the non-pointer type.
Note: when the pointer type defines an `IsZero()` member function, then that field would also be omitted during marshaling. There are no defined `IsZero()` function for these pointers, so
the semantics are preserved in this case.
The `omitzero` JSON tag in the fields definition structs, such as the `ActionPolicyDefinition struct`, does not change the semantics, it simply makes the marshaled representation more compact.
**Fields of type slice**:
The behavior for `omitempty` and `omitzero` differs for slices, and maps as well.
The struct fields of slice type, such as `[]string`, are left with the `omitempty` tag to be able to tell the difference between a nil slice and a non-nil, zero-length slice. Even though, currently most code paths do not explicitly differentiate between a nil slice and an empty slice, the `omitempty` tag is left unmodified out of abundance of caution.
---
Ref:
- #4973
- #4907
- upgrade to golangci-lint 2.6.1
- updates for gosec
- updates for govet
- updates for perfsprint
- updates modernize
Leaves out modernize:omitempty due to conflicts with tests
Add return values in long helper function. This Improves clarity
and avoids 'naked' return and removes `//nolint:nakedret` annotation.
Explicit return values for GetShardedPathAndFilePath
* remove omitempty from object id fields
* remove omitempty from storage stats fields
* use omitzero for other structs
* remove `omitempty` from `UpgradeLockIntent` fields
The `omitempty` JSON tag is ineffective for fields of
type `Time` and `Duration`.
Also, this struct is only used during the "upgrade"
protocol, and it is OK to explicitly serialize 0 values,
as it was done before.
Maintenance is critical for healthy of the repository.
On the other hand, Maintenance is complex, because
it runs multiple sub tasks each may generate different
results according to the maintenance policy.
The results may include deleting/combining/adding
data/metadata to the repository.
It is worthy to add more observability for these
tasks for below reasons:
It is helpful for troubleshooting. Any data change
to the repository is critical, the observability info
helps to understand what happened during the
maintenance and why that happened.
It is helpful for users to understand/predict the
repo's behavior. The repo data may be stored
in a public cloud for which costs are sensitive
to scale/duration of data stored. On the other
hand, repository has its own policy to manage
the data, so the data is not deleted until it is
safe enough according to the policy.
The observability info helps users to
understand how much data is in-use,
how much data is out of use and
when it is deleted
This is a breaking change to users who might be using Kopia as a library.
### Log Format
```json
{"t":"<timestamp-rfc-3389-microseconds>", "span:T1":"V1", "span:T2":"V2", "n":"<source>", "m":"<message>", /*parameters*/}
```
Where each record is associated with one or more spans that describe its scope:
* `"span:client": "<hash-of-username@hostname>"`
* `"span:repo": "<random>"` - random identifier of a repository connection (from `repo.Open`)
* `"span:maintenance": "<random>"` - random identifier of a maintenance session
* `"span:upload": "<hash-of-username@host:/path>"` - uniquely identifies upload session of a given directory
* `"span:checkpoint": "<random>"` - encapsulates each checkpoint operation during Upload
* `"span:server-session": "<random>"` -single client connection to the server
* `"span:flush": "<random>"` - encapsulates each Flush session
* `"span:maintenance": "<random>"` - encapsulates each maintenance operation
* `"span:loadIndex" : "<random>"` - encapsulates index loading operation
* `"span:emr" : "<random>"` - encapsulates epoch manager refresh
* `"span:writePack": "<pack-blob-ID>"` - encapsulates pack blob preparation and writing
(plus additional minor spans for various phases of the maintenance).
Notable points:
- Used internal zero allocation JSON writer for reduced memory usage.
- renamed `--disable-internal-log` to `--disable-repository-log` (controls saving blobs to repository)
- added `--disable-content-log` (controls writing of `content-log` files)
- all storage operations are also logged in a structural way and associated with the corresponding spans.
- all content IDs are logged in a truncated format (since first N bytes that are usually enough to be unique) to improve compressibility of logs (blob IDs are frequently repeated but content IDs usually appear just once).
This format should make it possible to recreate the journey of any single content throughout pack blobs, indexes and compaction events.
Nits and cleanups:
- clarify log message to indicate the effect of advancing the deletion watermark;
- add omitzero JSON tag to appropriate fields in snapshot.Manifest struct;
- use maps.Clone instead of explicit loop;
- rename function to IterateUnreferencedPacks for clarity;
- use atomic.Int32 type;
- move a continue check to the beginning of the loop, no actual
work / side effects were performed before the check;
- reduce type requirement in blob.ReadBlobMap
- use snapshot end time as tie breaker for equal start time
- pass prototype snapshot manifest in uploader
- remove -1 ns hack for checkpoints and fix corresponding test
- test snapshot.SortByTime
- use `slices.Clone`
- remove stale `.gometalinter.json`
- unexport `maintenance.dropDeletedContents`
- rename `fetchIndexBlob`
- use `require` in `TestTimeFuncWiring`
- Add read stats to snapshot verifier output
- Add periodic JSON progress output.
- Refactor the use of directory summary.
- Use stats mutex for all stats.
- Add processedBytes to the snapshot verify output
- Output more frequently, when bytes processed changes
* remove unnecessary type argument
* modernize with max
* unexport getPartial and update comment
* unexport getFull
* verifyNotCached helper
* use require helpers
* leverage verify[Not]Cached
* use windowsOSName const
* fix comment wrapping
* require in stat_test
* use 512 as the write size and log allocated size
* rename const to expectedMinAllocSize
* write a single byte to test file
* add TestGetBlockSizeFromCurrentFS
* require Positive
* log before invariant check
- enable `forcetypeassert` linter in non-test files
- add `//nolint` annotations
- add `testutil.EnsureType` helper for type assertions
- enable `forcetypeassert` linter in test files
- use 'require/assert'
- refactor TestUploadMetadataCompression as a table test
- allow tests to run in parallel
- use t.Cleanup and add a missing cleanup
- use maps.Clone
Cleanups in snapshot GC:
- add findUnreferencedAndRepairRereferenced
- moves logging from snapshotgc.Run to findUnreferencedAndRepairRereferenced
- removes returning collected stats, simplifies signature
- simplifies caller (snapshotgc.Run)
- removes now unused snapshotgc.Stats
Changes in behavior:
- logs before flushing to get info even if flush fails
- new: logs undeleted content stats
* fix(snapshots): Remove checkpoints after a complete snapshot
... by setting start time of checkpoints one nanosecond earlier than
that of the snapshot.
* test(snapshots): Test for leftover checkpoints
* fix linter issues
* removed stray curly brace
---------
Co-authored-by: Jarek Kowalski <jaak@jkowalski.net>