The kopia server was not uploading any logs to the repository,
because "repodiag" blob uploads would always fail.
The cause was the following: when the (log) repodiag blob
PUT operation was initiated, the `Context` used for this
operation was already canceled.
The context used for blob uploads is passed to
`repodiag.NewLogManager` when opening the repository.
In the case of the kopia server, the repository is asynchronously
opened in `server.Server.InitReposotoryAsync`. The context
passed to `repo.Open` is canceled after the "open repository"
server task completes.
This issue was introduced in #1691
Change:
Use `context.WithoutCancel()` instead of the context passed
when the repo diagnoser is created.
New tests are included to reproduce this failure and verify
the fix.
- test: ensure server logs are uploaded to the repo
- test: honor cancellation in map storage
- test: repodiag context cancellation
Ref:
- #1691
Added improved providervalidation logic which tests for read-after-write
property between connections. The new test was failing before the change
and is now passing for Google Drive, OneDrive and DropBox.
* Rename UnsupportedBlobRetention struct
Rename this struct to DefaultProviderImplementation in preparation for
adding other simple "default" functionality to it.
* Add other functions to default provider
Add other simple function implementations to the default provider so
that other providers can just embed this to get basic behavior.
* Cleanup existing users of default provider
* Add default provider to remaining storage types
Add the default provider to remaining storage providers and remove
functions that are now implemented by the default provider.
* Add new blob.Storage call to see if it's readonly
Return whether the storage is readonly so higher layers in the stack can
selectively disable some functionality if needed, like compaction.
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* Store and return retention info in test storage
Add a new interface and function that allows getting retention
information during testing. This allows for more exact comparisons about
retention duration and mode in tests.
* Fixup how blobtesting retention extension works
Use the clock instead of the object's mod time so that extensions are
from the "current time." This aligns with how the S3 blob storage
functions.
* Update retention tests to use more precise checks
Where possible, use the information returned by GetRetention in tests
that deal with retention information. This allows for more precise
comparions of retention duration and mode instead of indirectly testing
duration by advancing the clock and attempting to modify blobs.
* More robust error comparisons in retention tests
Update tests for retention to use `ErrorIs` checks instead of comparing
error messages.
* Use `require.NoError` in retention tests
Minor cleanup to reduce branches in code by using `require.NoError`
instead of if-blocks and `t.Fatal`.
* Implement ability to extend retention time on S3 buckets using Object Locks
* Move object-lock extension to maintenance.Params.
* Use a default function for unsupported extensions instead of duplicating code
* Fix potential lockup during object-lock extension
* Fix race condition. Add more code coverage
* rebase to V3
* Add checks to prevent user from setting Retention Period < Full Maintenance Interval
---------
Co-authored-by: Ashlie Martinez <ashmrtnz@alcion.ai>
* feat(repository): live cache eviction for persistent lru content cache
* Update internal/cache/persistent_lru_cache.go
Co-authored-by: Ali Dowair <adowair@umich.edu>
* merge the mutex cache into list cache
---------
Co-authored-by: Shikhar Mall <small@kopia.io>
Co-authored-by: Ali Dowair <adowair@umich.edu>
From https://github.com/google/gvisor/tree/master/tools/checklocks
This will perform static verification that we're using
`sync.Mutex`, `sync.RWMutex` and `atomic` correctly to guard access
to certain fields.
This was mostly just a matter of adding annotations to indicate which
fields are guarded by which mutex.
In a handful of places the code had to be refactored to allow static
analyzer to do its job better or to not be confused by some
constructs.
In one place this actually uncovered a bug where a function was not
releasing a lock properly in an error case.
The check is part of `make lint` but can also be invoked by
`make check-locks`.
* Introduce Volume sub-interface
The Volume interface defines APIs to access a storage provider's
volume (disk) capacity, usage, etc.. It is inherited by the Storage
interface, and is at the same hierarchical level as the Reader
interface.
* Add validations for new Volume method:
Check that GetCapacity() either returns `ErrNotAVolume`, or that it
returns a Capacity struct with values that make sense.
* Implement default (passthrough) GetCapacity:
Cloud providers do not have finite volumes, and WebDAV volumes have no
notion of volume size and usage. These implementations should just
return an error (ErrNotAVolume) when their GetCapacity() is called.
* Implement GetCapacity for sftp storage: Uses the sftp.Client interface
* Implement GetCapacity for logging, readonly store
* Implement GetCapacity() for blobtesting implementations
* Implement GetCapacity() for Google Drive:
Also modifies GetDriveClient to return the entire service instead of just the Files client.
* Implemented GetCapacity() for filesystem storage:
Implemented the function in a seperate file for each OS/architecture (Unix, OpenBSD, Windows).
Also simplified validation test suite, which will simply test whether
the provider supports DoNotRecreate or properly rejects it without
external configuration.
* Add a new PutBlob option and blob error type
When `DoNotRecreate` is set as true, the blob put operation should
only succeed if no blob with the given blob ID already exists.
Othwerwise, `ErrBlobAlreadyExists` is returned.
* Validate default storage providers' support
By default, storage providers should not support idempotent creates.
This commit adds error handling to exit early if `DoNotRecreate` is
set to true. The commit also verifies this behavior in the provider
validation test.
* Implement support for new option in GCS storage
* Push PutBlob option handling down to Impl
When PutBlob options were introduced, error handling logic for them
was implemented for the Sharded storage interface. However, the
behavior of different providers that implement Sharded can be
different, so it's better to push the options down to be processed in
the provider implementations.
* Introduce new error type for unsupported put opts
To unify error handling code and make it more maintainable, introduce
a new error type `blob.ErrUnsupportedPutBlobOption`, which is to be
returned whenever a storage provider implementation is given put
options it does not support.
* feat: persisting retention options in repository blob
- plumb retention parameters through wrapped storage
- generalize aes encryption mechanism
- rewrite the retention blob on password change
- do not write retention blob when empty
* handle retention-blob not-found failures
* cli params to set retention modes on repository create
* enable versioned map mock storage with retention settings
* adding unit tests
* write format and retention blob with retention settings if available
* rename certain functions and constants specific to format blob
* delete retention cache on password-change
* fix: replace SetTime() api call with TouchBlob()
* Update repo/repository_test.go
Co-authored-by: Nick <nick@kasten.io>
* pr feedback and codecov improvements
* fix: rename retention-blob structures to generic blob-cfg
* fix: remove minio dependency on retention constants
Co-authored-by: Shikhar Mall <shikhar@kasten.io>
Co-authored-by: Nick <nick@kasten.io>
* sharded: plumbed through blob.PutOptions
* blob: removed blob.Storage.SetTime() method
This was only used for `kopia repo sync-to` and got replaced with
an equivalent blob.PutOptions.SetTime, which wehn set to non-zero time
will attempt to set the modification time on a file.
Since some providers don't support changing modification time, we
are able to emulate it using per-blob metadata (on B2, Azure and GCS),
sadly S3 is still unsupported, because it does not support returning
metadata in list results.
Also added PutOptions.GetTime, which when set to not nil, will
populate the provided variable with actual time that got assigned
to the blob.
Added tests that verify that each provider supports GetTime
and SetTime according to this spec.
* blob: additional test coverage for filesystem storage
* blob: added PutBlobAndGetMetadata() helper and used where appropriate
* fixed test failures
* pr feedback
* Update repo/blob/azure/azure_storage.go
Co-authored-by: Shikhar Mall <mall.shikhar.in@gmail.com>
* Update repo/blob/filesystem/filesystem_storage.go
Co-authored-by: Shikhar Mall <mall.shikhar.in@gmail.com>
* Update repo/blob/filesystem/filesystem_storage.go
Co-authored-by: Shikhar Mall <mall.shikhar.in@gmail.com>
* blobtesting: fixed object_locking_map.go
* blobtesting: removed SetTime from ObjectLockingMap
Co-authored-by: Shikhar Mall <mall.shikhar.in@gmail.com>
* blob: changed default shards from {3,3} to {1,3}
Turns out for very large repository around 100TB (5M blobs),
we end up creating max ~16M directories which is way too much
and slows down listing. Currently each leaf directory only has a handful
of files.
Simple sharding of {3} should work much better and will end up creating
directories with meaningful shard sizes - 12 K files per directory
should not be too slow and will reduce the overhead of listing by
4096 times.
The change is done in a backwards-compatible way and will respect
custom sharding (.shards) file written by previous 0.9 builds
as well as older repositories that don't have the .shards file (which
we assume to be {3,3}).
* fixed compat tests
The dual time measurement is described in
https://go.googlesource.com/proposal/+/master/design/12914-monotonic.md
The fix is to discard hidden monotonic time component of time.Time
by converting to unix time and back.
Reviewed usage of clock.Now() and replaced with timetrack.StartTimer()
when measuring time.
The problem in #1402 was that passage of time was measured using
the monotonic time and not wall clock time. When the computer goes
to sleep, monotonic time is still monotonic while wall clock time makes
a leap when the computer wakes up. This is the behavior that
epoch manager (and most other compontents in Kopia) rely upon.
Fixes#1402
Co-authored-by: Julio Lopez <julio+gh@kasten.io>
* refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* chore: remove //nolint:gosec for os.ReadFile
At the time of this commit, the G304 rule of gosec does not include the
`os.ReadFile` function. We remove `//nolint:gosec` temporarily until
https://github.com/securego/gosec/pull/706 is merged.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* cli: added 'repository validate-provider' which runs a set of tests against blob storage provider to validate it
This implements a provider tests which exercises subtle behaviors which are not always correctly implemented by providers claiming compatibility with S3, for example.
The test checks:
- not found behavior
- prefix scans
- timestamps
- write atomicity
* retry: improved error message on failure
* rclone: fixed stats reporting and awaiting for completion
* webdav: prevent panic when attempting to mkdir with empty name
* testing: run providervalidation.ValidateProvider as part of regular provider tests
* cli: print a recommendation to validate provider after repository creation
* testing: removed testutil.Retry because all providers now have internal retries
* testing: simplified and unified cleanup for all cloud providers using shared buckets
* added framework for unit testing against remote real rclone remotes,
added google drive backend
* added parallelism to blobtesting which revealed some races during
PutBlob with WebDAV.
* content: fixed time-based auto-flush behavior to behave like Flush()
Previously it would sometimes be possible for a content whose write
started before time-based flush to finish writing afterwards (and it
would be included in the new index).
Refactored the code so that time-based flush happens before WriteContent
write and behaves exactly the same was as real Flush() so all writes
started before it will be awaited during the flush.
Also previous regression test was incorrect since it was mocking the
wrong blob method.
* content: refactored index blob manager crypto to separate file
This will be reused for encrypting session info.
* content: added support for session markers
Session marker (`s` blob) is written BEFORE the first data blob
(`p` or `q`) that belongs to new index segment (`n` is written).
Session marker is removed AFTER the index blob (`n`) has been written.
All pack and index blobs belonging to a session will have the session
ID as its suffix, so that if a reader can see `s<sessionID>` blob, they
will ignore any `p` and `q` blobs with the same suffix.
* maintenance: ignore blobs belonging to active sessions when running blob garbage collection
* cli: added 'sessions list' for listing active sessions
* content: added retrying writing previously failed blobs before writing new one
* linter: upgraded to 1.33, disabled some linters
* lint: fixed 'errorlint' errors
This ensures that all error comparisons use errors.Is() or errors.As().
We will be wrapping more errors going forward so it's important that
error checks are not strict everywhere.
Verified that there are no exceptions for errorlint linter which
guarantees that.
* lint: fixed or suppressed wrapcheck errors
* lint: nolintlint and misc cleanups
Co-authored-by: Julio López <julio+gh@kasten.io>