* Fixed how blob storage PutBlob errors are handled in content.Manager
In order to guarantee that all index entries have corresponding
pack blobs, we must ensure that `content.Manager.Flush` will
not succeed unless all pending writes have completed.
Added test that simulates various patterns of PutBlock failures and
ensures that data remains durable despite those, assuming all calls
to `WriteContent()` and `Flush()` are retried.
* addressed review feedback
`snapshot gc` marks contents not reachable from the root of any snapshot
as soft-deleted
The algorithm is a mark-and-sweep with parallel iteration of objects.
Currently it stores content IDs and object IDs in a map, so won't scale
to huge repositories, but this can be fixed in the future.
This fixes#110 at least for reasonable repository sizes.
Previously, it was possible for Flush() to miss in-flight writes,
but only when using repository manually since Uploader guarantees
there are no in-flight writes when it completes.
With this change Flush() will guarantee that any pending writes
completed before Flush() has started are guaranteed to be committed
to the repository before Flush() returns.
This was actually a regression introduced in #105.
Added regression test to prevent it from reoccurring.
Previously 'packIndexBuilder' contained both contents that have been
written to packs and the ones that have not.
This change makes it so that 'packIndexBuilder' only contains contents
from flushed packs, but non pending ones. It will help parallelize
writes later.
- separated portions that don't require locking into separate struct
to make it easier to reason about state
- moved iteration-related content to separate file
- parallelized os.Lstat() x 16 (dramatically improves speed)
- discarded unused portions of os.FileInfo (uses 60% less RAM on macOS)
BEFORE:
10:47:03.670 [kopia/localfs] listed 200000 entries in 43.871211686s using 79126528 bytes of heap
After:
10:49:12.439 [kopia/localfs] listed 200000 entries in 1.953018184s using 30515200 bytes of heap
This puts all content blocks with non-empty prefix into starting with
`q` instead of `p`. This neatly separates all data (p) from metadata
(q) at the storage level and allows different storage policies, since
most data is not going to be ever accessed ever, but metadata is going
to be read a lot..
We can more aggressively cache contents from `q`.
Tests are failing because pkg/sftp won't overwrite an existing file
(Rename function) and the test is actually doing that with
blobtesting.VerifyStorage.
The solution is to use pkg/sftp's PosixRename function:
"PosixRename renames a file using the posix-rename@openssh.com
extension which will replace newname if it already exists."
Additionally, the provider now creates the path on the server, if it
doesn't exist.
Repository.Token() generates a base64-encoded token that can
be stored in password manager that fully describes repository connection
information (blob.ConnectionInfo) and optionally a password.
Use `kopia repo status -t` to print the token.
Use `kopia repo status -t -s` to print the token that also includes
repository password.
Use `kopia repo connect from-config --token T` to reconnect using the
token.