Previously, it was possible for Flush() to miss in-flight writes,
but only when using repository manually since Uploader guarantees
there are no in-flight writes when it completes.
With this change Flush() will guarantee that any pending writes
completed before Flush() has started are guaranteed to be committed
to the repository before Flush() returns.
This was actually a regression introduced in #105.
Added regression test to prevent it from reoccurring.
Previously 'packIndexBuilder' contained both contents that have been
written to packs and the ones that have not.
This change makes it so that 'packIndexBuilder' only contains contents
from flushed packs, but non pending ones. It will help parallelize
writes later.
- separated portions that don't require locking into separate struct
to make it easier to reason about state
- moved iteration-related content to separate file
- parallelized os.Lstat() x 16 (dramatically improves speed)
- discarded unused portions of os.FileInfo (uses 60% less RAM on macOS)
BEFORE:
10:47:03.670 [kopia/localfs] listed 200000 entries in 43.871211686s using 79126528 bytes of heap
After:
10:49:12.439 [kopia/localfs] listed 200000 entries in 1.953018184s using 30515200 bytes of heap
This puts all content blocks with non-empty prefix into starting with
`q` instead of `p`. This neatly separates all data (p) from metadata
(q) at the storage level and allows different storage policies, since
most data is not going to be ever accessed ever, but metadata is going
to be read a lot..
We can more aggressively cache contents from `q`.
Tests are failing because pkg/sftp won't overwrite an existing file
(Rename function) and the test is actually doing that with
blobtesting.VerifyStorage.
The solution is to use pkg/sftp's PosixRename function:
"PosixRename renames a file using the posix-rename@openssh.com
extension which will replace newname if it already exists."
Additionally, the provider now creates the path on the server, if it
doesn't exist.
Repository.Token() generates a base64-encoded token that can
be stored in password manager that fully describes repository connection
information (blob.ConnectionInfo) and optionally a password.
Use `kopia repo status -t` to print the token.
Use `kopia repo status -t -s` to print the token that also includes
repository password.
Use `kopia repo connect from-config --token T` to reconnect using the
token.
Uses go/ssh and pkg/sftp as building blocks and implements the common
sharded.Storage interface, shared between the filesystem and webdav
providers.
A couple of notes:
- The provider assumes the user has a working public/private key
connection to the ssh server.
No other authentication method is supported
- The repository path must exist on the server
- (testing related) The pkg/sftp server doesn't offer a way to set a
server filesystem root, so, during testing, it runs from the local
directory which is repo/blob/sftp. So the tests leave some debris
behind. Additionally, that's the reason why id_rsa and known_hosts
are there at all.
- Encrypted keyfiles are currently not supported (but it could be done)