Commit Graph

16 Commits

Author SHA1 Message Date
Jarek Kowalski
cead806a3f blob: changed default shards from {3,3} to {1,3} (#1513)
* blob: changed default shards from {3,3} to {1,3}

Turns out for very large repository around 100TB (5M blobs),
we end up creating max ~16M directories which is way too much
and slows down listing. Currently each leaf directory only has a handful
of files.

Simple sharding of {3} should work much better and will end up creating
directories with meaningful shard sizes - 12 K files per directory
should not be too slow and will reduce the overhead of listing by
4096 times.

The change is done in a backwards-compatible way and will respect
custom sharding (.shards) file written by previous 0.9 builds
as well as older repositories that don't have the .shards file (which
we assume to be {3,3}).

* fixed compat tests
2021-11-16 06:02:04 -08:00
Shikhar Mall
2857c4831a storage api put-blob retention options (#1511)
* storage api put-blob retention options

Co-authored-by: Shikhar Mall <shikhar@kasten.io>
2021-11-15 19:46:42 -08:00
Jarek Kowalski
e42cc6ccce Added 'kopia repository validate-provider` (#1205)
* cli: added 'repository validate-provider' which runs a set of tests against blob storage provider to validate it

This implements a provider tests which exercises subtle behaviors which are not always correctly implemented by providers claiming compatibility with S3, for example.

The test checks:

- not found behavior
- prefix scans
- timestamps
- write atomicity

* retry: improved error message on failure

* rclone: fixed stats reporting and awaiting for completion

* webdav: prevent panic when attempting to mkdir with empty name

* testing: run providervalidation.ValidateProvider as part of regular provider tests

* cli: print a recommendation to validate provider after repository creation
2021-07-19 21:42:24 -07:00
Jarek Kowalski
4c2f52a2e3 Rclone and testing improvements (#1202)
* sharded: added parallel iteration of blobs to improve performance

* retry: reduce first retry delay 1s->100ms

* testing: additional assertions for blob storage testing

* rclone: testing cleanup improvements, re-enabled OneDrive

* cli: added --list-parallelism parameter to fs,webdav,sftp and rclone

* sharded: added dedicated test
2021-07-17 16:04:51 -07:00
Jarek Kowalski
d0f2ef53d7 blob: improved startup error handling of rclone and webdav PutBlob race (#915)
* added framework for unit testing against remote real rclone remotes,
  added google drive backend
* added parallelism to blobtesting which revealed some races during
  PutBlob with WebDAV.
2021-03-28 08:26:35 -07:00
Jarek Kowalski
b6e68fa28a Fixed few coverage flakes (#872)
* blobtesting: coverage for GetMetadata() returning ErrNotFound
* content: additional direct coverage for diskCommittedContentIndexCache
2021-03-07 00:03:20 -08:00
Jarek Kowalski
fd24227379 b2: fixed handling of 'no_such_file' to indicate NOT_FOUND (#646)
Fixes #645
2020-09-26 21:01:04 -07:00
Jarek Kowalski
c242235a32 blob: added SetTime() method which may be optionally implemented by blob.Storage (#575)
cli: added --times option to 'repository sync'
2020-08-31 19:50:15 -07:00
Jarek Kowalski
9a6dea898b Linter upgrade to v1.30.0 (#526)
* fixed godot linter errors
* reformatted source with gofumpt
* disabled some linters
* fixed nolintlint warnings
* fixed gci warnings
* lint: fixed 'nestif' warnings
* lint: fixed 'exhaustive' warnings
* lint: fixed 'gocritic' warnings
* lint: fixed 'noctx' warnings
* lint: fixed 'wsl' warnings
* lint: fixed 'goerr113' warnings
* lint: fixed 'gosec' warnings
* lint: upgraded linter to 1.30.0
* lint: more 'exhaustive' warnings

Co-authored-by: Nick <nick@kasten.io>
2020-08-12 19:28:53 -07:00
Jarek Kowalski
60977812f0 Support for gather writes (#373)
, where blob.Storage.PutBlob gets a list of slices and writes them sequentially 
* performance: added gather.Bytes and gather.WriteBuffer

They are similar to bytes.Buffer but instead of managing a single
byte slice, they maintain a list of slices that and when they run out of
space they allocate new fixed-size slice from a free list.

This helps keep memory allocations completely under control regardless
of the size of data written.

* switch from byte slices and bytes.Buffer to gather.Bytes.

This is mostly mechanical, the only cases where it's not involve blob
storage providers, where we leverage the fact that we don't need to
ever concatenate the slices into one and instead we can do gather
writes.

* PR feedback
2020-03-24 15:05:52 -07:00
Jarek Kowalski
5f96b0240a testing: added retry helper 2020-03-09 21:34:10 -07:00
Jarek Kowalski
c8fcae93aa logging: refactored logging
This is mostly mechanical and changes how loggers are instantiated.

Logger is now associated with a context, passed around all methods,
(most methods had ctx, but had to add it in a few missing places).

By default Kopia does not produce any logs, but it can be overridden,
either locally for a nested context, by calling

ctx = logging.WithLogger(ctx, newLoggerFunc)

To override logs globally, call logging.SetDefaultLogger(newLoggerFunc)

This refactoring allowed removing dependency from Kopia repo
and go-logging library (the CLI still uses it, though).

It is now also possible to have all test methods emit logs using
t.Logf() so that they show up in failure reports, which should make
debugging of test failures suck less.
2020-02-25 17:24:44 -08:00
Jarek Kowalski
29e5750686 travis: added bare-bones Windows build that does go test
fixed some issues that prevented go test from passing on Windows:

- webdav client used \ instead of /
- need retries around mmap.Open()
- paths are prefixed with C:\ on windows
- time.Now() does not always move forward on Windows
2020-02-09 20:22:14 -08:00
Jarek Kowalski
6217df1a87 lint: switched to 1.21 and fixed a ton of whitespace issues discovered
by new wsl linter
2019-11-26 06:49:49 -08:00
Jarek Kowalski
54edb97b3a refactoring: renamed repo/block to repo/content
Also introduced strongly typed content.ID and manifest.ID (instead of string)

This aligns identifiers across all layers of repository:

blob.ID
content.ID
object.ID
manifest.ID
2019-06-01 22:24:19 -07:00
Jarek Kowalski
9e5d0beccd refactoring: renamed storage.Storage to blob.Storage
This updates the terminology everywhere - blocks become blobs and
`storage.Storage` becomes `blob.Storage`.

Also introduced blob.ID which is a specialized string type, that's
different from CABS block ID.

Also renamed CLI subcommands from `kopia storage` to `kopia blob`.

While at it introduced `block.ErrBlockNotFound` and
`object.ErrObjectNotFound` that do not leak from lower layers.
2019-06-01 14:10:35 -07:00