Commit Graph

2 Commits

Author SHA1 Message Date
Jarek Kowalski
cead806a3f blob: changed default shards from {3,3} to {1,3} (#1513)
* blob: changed default shards from {3,3} to {1,3}

Turns out for very large repository around 100TB (5M blobs),
we end up creating max ~16M directories which is way too much
and slows down listing. Currently each leaf directory only has a handful
of files.

Simple sharding of {3} should work much better and will end up creating
directories with meaningful shard sizes - 12 K files per directory
should not be too slow and will reduce the overhead of listing by
4096 times.

The change is done in a backwards-compatible way and will respect
custom sharding (.shards) file written by previous 0.9 builds
as well as older repositories that don't have the .shards file (which
we assume to be {3,3}).

* fixed compat tests
2021-11-16 06:02:04 -08:00
Jarek Kowalski
1566b74263 blob: support for custom blob store sharding (#1299)
* blob: support for custom blob store sharding

This is experimental.

The .shards file can reside in the root of any blob storage that uses
sharding (filesystem/sftp/webdav) and can specify rules for sharding.

{
  "default": [3,2,1],
  "overrides": [
    { "prefix": "p", "shards": [2,2] },
    { "prefix": "x", "shards": [1,1,1] }
  ],
  "maxNonShardedLength": 2
}

With this in place we'll be later able to do resharding of the
repository to optimize get/put/list performance for both repositories
and caches.

* cli: command line tools to manipulate shards in a directory
2021-09-19 18:50:38 -07:00