The kopia server was not uploading any logs to the repository,
because "repodiag" blob uploads would always fail.
The cause was the following: when the (log) repodiag blob
PUT operation was initiated, the `Context` used for this
operation was already canceled.
The context used for blob uploads is passed to
`repodiag.NewLogManager` when opening the repository.
In the case of the kopia server, the repository is asynchronously
opened in `server.Server.InitReposotoryAsync`. The context
passed to `repo.Open` is canceled after the "open repository"
server task completes.
This issue was introduced in #1691
Change:
Use `context.WithoutCancel()` instead of the context passed
when the repo diagnoser is created.
New tests are included to reproduce this failure and verify
the fix.
- test: ensure server logs are uploaded to the repo
- test: honor cancellation in map storage
- test: repodiag context cancellation
Ref:
- #1691
* Rename UnsupportedBlobRetention struct
Rename this struct to DefaultProviderImplementation in preparation for
adding other simple "default" functionality to it.
* Add other functions to default provider
Add other simple function implementations to the default provider so
that other providers can just embed this to get basic behavior.
* Cleanup existing users of default provider
* Add default provider to remaining storage types
Add the default provider to remaining storage providers and remove
functions that are now implemented by the default provider.
* Add new blob.Storage call to see if it's readonly
Return whether the storage is readonly so higher layers in the stack can
selectively disable some functionality if needed, like compaction.
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* Implement ability to extend retention time on S3 buckets using Object Locks
* Move object-lock extension to maintenance.Params.
* Use a default function for unsupported extensions instead of duplicating code
* Fix potential lockup during object-lock extension
* Fix race condition. Add more code coverage
* rebase to V3
* Add checks to prevent user from setting Retention Period < Full Maintenance Interval
---------
Co-authored-by: Ashlie Martinez <ashmrtnz@alcion.ai>
* feat(repository): live cache eviction for persistent lru content cache
* Update internal/cache/persistent_lru_cache.go
Co-authored-by: Ali Dowair <adowair@umich.edu>
* merge the mutex cache into list cache
---------
Co-authored-by: Shikhar Mall <small@kopia.io>
Co-authored-by: Ali Dowair <adowair@umich.edu>
From https://github.com/google/gvisor/tree/master/tools/checklocks
This will perform static verification that we're using
`sync.Mutex`, `sync.RWMutex` and `atomic` correctly to guard access
to certain fields.
This was mostly just a matter of adding annotations to indicate which
fields are guarded by which mutex.
In a handful of places the code had to be refactored to allow static
analyzer to do its job better or to not be confused by some
constructs.
In one place this actually uncovered a bug where a function was not
releasing a lock properly in an error case.
The check is part of `make lint` but can also be invoked by
`make check-locks`.
* Introduce Volume sub-interface
The Volume interface defines APIs to access a storage provider's
volume (disk) capacity, usage, etc.. It is inherited by the Storage
interface, and is at the same hierarchical level as the Reader
interface.
* Add validations for new Volume method:
Check that GetCapacity() either returns `ErrNotAVolume`, or that it
returns a Capacity struct with values that make sense.
* Implement default (passthrough) GetCapacity:
Cloud providers do not have finite volumes, and WebDAV volumes have no
notion of volume size and usage. These implementations should just
return an error (ErrNotAVolume) when their GetCapacity() is called.
* Implement GetCapacity for sftp storage: Uses the sftp.Client interface
* Implement GetCapacity for logging, readonly store
* Implement GetCapacity() for blobtesting implementations
* Implement GetCapacity() for Google Drive:
Also modifies GetDriveClient to return the entire service instead of just the Files client.
* Implemented GetCapacity() for filesystem storage:
Implemented the function in a seperate file for each OS/architecture (Unix, OpenBSD, Windows).
* Add a new PutBlob option and blob error type
When `DoNotRecreate` is set as true, the blob put operation should
only succeed if no blob with the given blob ID already exists.
Othwerwise, `ErrBlobAlreadyExists` is returned.
* Validate default storage providers' support
By default, storage providers should not support idempotent creates.
This commit adds error handling to exit early if `DoNotRecreate` is
set to true. The commit also verifies this behavior in the provider
validation test.
* Implement support for new option in GCS storage
* Push PutBlob option handling down to Impl
When PutBlob options were introduced, error handling logic for them
was implemented for the Sharded storage interface. However, the
behavior of different providers that implement Sharded can be
different, so it's better to push the options down to be processed in
the provider implementations.
* Introduce new error type for unsupported put opts
To unify error handling code and make it more maintainable, introduce
a new error type `blob.ErrUnsupportedPutBlobOption`, which is to be
returned whenever a storage provider implementation is given put
options it does not support.
* sharded: plumbed through blob.PutOptions
* blob: removed blob.Storage.SetTime() method
This was only used for `kopia repo sync-to` and got replaced with
an equivalent blob.PutOptions.SetTime, which wehn set to non-zero time
will attempt to set the modification time on a file.
Since some providers don't support changing modification time, we
are able to emulate it using per-blob metadata (on B2, Azure and GCS),
sadly S3 is still unsupported, because it does not support returning
metadata in list results.
Also added PutOptions.GetTime, which when set to not nil, will
populate the provided variable with actual time that got assigned
to the blob.
Added tests that verify that each provider supports GetTime
and SetTime according to this spec.
* blob: additional test coverage for filesystem storage
* blob: added PutBlobAndGetMetadata() helper and used where appropriate
* fixed test failures
* pr feedback
* Update repo/blob/azure/azure_storage.go
Co-authored-by: Shikhar Mall <mall.shikhar.in@gmail.com>
* Update repo/blob/filesystem/filesystem_storage.go
Co-authored-by: Shikhar Mall <mall.shikhar.in@gmail.com>
* Update repo/blob/filesystem/filesystem_storage.go
Co-authored-by: Shikhar Mall <mall.shikhar.in@gmail.com>
* blobtesting: fixed object_locking_map.go
* blobtesting: removed SetTime from ObjectLockingMap
Co-authored-by: Shikhar Mall <mall.shikhar.in@gmail.com>
Globally replaced all use of time with internal 'clock' package
which provides indirection to time.Now()
Added support for faking clock in Kopia via KOPIA_FAKE_CLOCK_ENDPOINT
logfile: squelch annoying log message
testenv: added faketimeserver which serves time over HTTP
testing: added endurance test which tests kopia over long time scale
This creates kopia repository and simulates usage of Kopia over multiple
months (using accelerated fake time) to trigger effects that are only
visible after long time passage (maintenance, compactions, expirations).
The test is not used part of any test suite yet but will run in
post-submit mode only, preferably 24/7.
testing: refactored internal/clock to only support injection when
'testing' build tag is present
* blob: added DisplayName() method to blob.Storage
* cli: added 'kopia repo sync-to <provider>' which replicates BLOBs
Usage demo: https://asciinema.org/a/352299Fixes#509
* implemented suggestion by Ciantic to fail sync if the destination repository is not compatible with the source
* cli: added 'kopia repo sync --must-exist'
This ensures that target repository is not empty, otherwise syncing to
an accidentally unmounted filesystem directory might copy everything
again.
* content: added support for cache of own writes
Thi keeps track of which blobs (n and m) have been written by the
local repository client, so that even if the storage listing
is eventually consistent (as in S3), we get somewhat sane behavior.
Note that this is still assumming read-after-create semantics, which
S3 also guarantees, otherwise it's very hard to do anything useful.
* compaction: support for compaction logs
Instead of compaction immediately deleting source index blobs, we now
write log entries (with `m` prefix) which are merged on reads
and applied only if the blob list includes all inputs and outputs, in
which case the inputs are discarded since they are known to have been
superseded by the outputs.
This addresses eventual consistency issues in stores such as S3,
which don't guarantee list-after-put or list-after-delete. With such
stores the repository is ultimately eventually consistent and there's
not much that can be done about it, unless we use second strongly
consistent storage (such as GCS) for the index only.
* content: updated list cache to cache both `n` and `m`
* repo: fixed cache clear on windows
Clearing cache requires closing repository first, as Windows is holding
the files locked.
This requires ability to close the repository twice.
* content: refactored index blob management into indexBlobManager
* testing: fixed blobtesting.Map storage to allow overwrites
* blob: added debug output String() to blob.Metadata
* testing: added indexBlobManager stress test
This works by using N parallel "actors", each repeatedly performing
operations on indexBlobManagers all sharing single eventually consistent
storage.
Each actor runs in a loop and randomly selects between:
- *reading* all contents in indexes and verifying that it includes
all contents written by the actor so far and that contents are
correctly marked as deleted
- *creating* new contents
- *deleting* one of previously-created contents (by the same actor)
- *compacting* all index files into one
The test runs on accelerated time (every read of time moves it by 0.1
seconds) and simulates several hours of running.
In case of a failure, the log should provide enough debugging
information to trace the exact sequence of events leading up to the
failure - each log line is prefixed with actorID and all storage
access is logged.
* makefile: increase test timeout
* content: fixed index blob manager race
The race is where if we delete compaction log too early, it may lead to
previously deleted contents becoming temporarily live again to an
outside observer.
Added test case that reproduces the issue, verified that it fails
without the fix and passed with one.
* testing: improvements to TestIndexBlobManagerStress test
- better logging to be able to trace the root cause in case of a failure
- prevented concurrent compaction which is unsafe:
The sequence:
1. A creates contentA1 in INDEX-1
2. B creates contentB1 in INDEX-2
3. A deletes contentA1 in INDEX-3
4. B does compaction, but is not seeing INDEX-3 (due to EC or simply
because B started read before #3 completed), so it writes
INDEX-4==merge(INDEX-1,INDEX-2)
* INDEX-4 has contentA1 as active
5. A does compaction but it's not seeing INDEX-4 yet (due to EC
or because read started before #4), so it drops contentA1, writes
INDEX-5=merge(INDEX-1,INDEX-2,INDEX-3)
* INDEX-5 does not have contentA1
7. C sees INDEX-5 and INDEX-5 and merge(INDEX-4,INDEX-5)
contains contentA1 which is wrong, because A has been deleted
(and there's no record of it anywhere in the system)
* content: when building pack index ensure index bytes are different each time by adding 32 random bytes
, where blob.Storage.PutBlob gets a list of slices and writes them sequentially
* performance: added gather.Bytes and gather.WriteBuffer
They are similar to bytes.Buffer but instead of managing a single
byte slice, they maintain a list of slices that and when they run out of
space they allocate new fixed-size slice from a free list.
This helps keep memory allocations completely under control regardless
of the size of data written.
* switch from byte slices and bytes.Buffer to gather.Bytes.
This is mostly mechanical, the only cases where it's not involve blob
storage providers, where we leverage the fact that we don't need to
ever concatenate the slices into one and instead we can do gather
writes.
* PR feedback
This updates the terminology everywhere - blocks become blobs and
`storage.Storage` becomes `blob.Storage`.
Also introduced blob.ID which is a specialized string type, that's
different from CABS block ID.
Also renamed CLI subcommands from `kopia storage` to `kopia blob`.
While at it introduced `block.ErrBlockNotFound` and
`object.ErrObjectNotFound` that do not leak from lower layers.