Use non-formatting logging functions for message without formatting.
For example, `log.Info("message")` instead of `log.Infof("message")`
Configure linter for printf-like functions
Lint fixes in preparation for moving to Go 1.20
Remove deprecated calls to `rand.Seed`
In Go 1.20 the default generator is seeded randomly at program startup,
which is the desired behavior for these tests.
Remove uses of deprecated rand.Read: replace with calls to rand.Uint64()
Remove deprecated uses of rand.Read in content manager tests and
S3 versioned tests.
Adds a concurrency-safe helpers to provide functionality similar to that
provided by `rand.Read(b []byte) (int, error)`
Almost all were easy to replace, except ones exposed via JSON which
have been left as-is.
The linter has a cool behavior where it flags attempts to pass
`atomic.Int32` for example by value , which is always a mistake,
say as an argument to `fmt.Sprintf()`
- removed a bunch of hacks and should improve the logging
performance by avoiding interfaces and data translation. This will
allow using of de-sugared loggers in performance-critical
logging situations.
- this will also allow using features of ZAP more directly without
having to reimplement them.
- moved logging.Printf() to testlogging
- refactored `uitask` to store logs in a structural format and
present them as JSON only in the UI
- renamed printf_logger.go to printf.go so that fewer columns are used
in the logs
* refactor(repository): ensure we always parse content.ID and object.ID
This changes the types to be incompatible with string to prevent direct
conversion to and from string.
This has the additional benefit of reducing number of memory allocations
and bytes for all IDs.
content.ID went from 2 allocations to 1:
typical case 32 characters + 16 bytes per-string overhead
worst-case 65 characters + 16 bytes per-string overhead
now: 34 bytes
object.ID went from 2 allocations to 1:
typical case 32 characters + 16 bytes per-string overhead
worst-case 65 characters + 16 bytes per-string overhead
now: 36 bytes
* move index.{ID,IDRange} methods to separate files
* replaced index.IDFromHash with content.IDFromHash externally
* minor tweaks and additional tests
* Update repo/content/index/id_test.go
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* Update repo/content/index/id_test.go
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* pr feedback
* post-merge fixes
* pr feedback
* pr feedback
* fixed subtle regression in sortedContents()
This was actually not producing invalid results because of how base36
works, just not sorting as efficiently as it could.
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
* blob: changed default shards from {3,3} to {1,3}
Turns out for very large repository around 100TB (5M blobs),
we end up creating max ~16M directories which is way too much
and slows down listing. Currently each leaf directory only has a handful
of files.
Simple sharding of {3} should work much better and will end up creating
directories with meaningful shard sizes - 12 K files per directory
should not be too slow and will reduce the overhead of listing by
4096 times.
The change is done in a backwards-compatible way and will respect
custom sharding (.shards) file written by previous 0.9 builds
as well as older repositories that don't have the .shards file (which
we assume to be {3,3}).
* fixed compat tests
* logging: added Logger.Debugw(message, key1, value1, ..., keyN, valueN)
This is based on ZAP and allows structural logs to be emitted.
* cli: added --json-log-console and --json-log-file flags
* logging: updated storage logging wrapper to use structural logging
* pr feedback
* refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* chore: remove //nolint:gosec for os.ReadFile
At the time of this commit, the G304 rule of gosec does not include the
`os.ReadFile` function. We remove `//nolint:gosec` temporarily until
https://github.com/securego/gosec/pull/706 is merged.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* testing: refactored logs directory management
* content: fixed index mutex to be shared across all write sessions
added mutex protection during writecontent/refresh race
* testing: upload log artifacts
* content: bump revision number after index has been added
This fixes a bug where manifest manager in another session for
the same open repository may not see a content added, because they
will prematurely cache the incomplete set of contents.
This took 2 weeks to find.
* manifest: improved log output, fixed unnecessary mutex release
* testing: rewrote stress test to be model-based and more precise
* cli: added a flag to create repository with v2 index features
* content: plumb through compression.ID parameter to content.Manager.WriteContent()
* content: expose content.Manager.SupportsContentCompression
This allows object manager to decide whether to create compressed object
or let the content manager do it.
* object: if compression is requested and the repo supports it, pass compression ID to the content manager
* cli: show compression status in 'repository status'
* cli: output compression information in 'content list' and 'content stats'
* content: compression and decompression support
* content: unit tests for compression
* object: compression tests
* testing: added integration tests against v2 index
* testing: run all e2e tests with and without content-level compression
* htmlui: added UI for specifying index format on creation
* cli: additional tests for 'content ls' and 'content stats'
* applied pr suggestions
* blob: refactored upload reporting
Instead of plumbing this through blob storage context, we are passing
and explicit callback that reports uploads as they happen.
* htmlui: improved counter presentation
* nit: added missing UI route which fixes Reload behavior on the Tasks page
- `repo.Repository` is now read-only and only has methods that can be supported over kopia server
- `repo.RepositoryWriter` has read-write methods that can be supported over kopia server
- `repo.DirectRepository` is read-only and contains all methods of `repo.Repository` plus some low-level methods for data inspection
- `repo.DirectRepositoryWriter` contains write methods for `repo.DirectRepository`
- `repo.Reader` removed and merged with `repo.Repository`
- `repo.Writer` became `repo.RepositoryWriter`
- `*repo.DirectRepository` struct became `repo.DirectRepository`
interface
Getting `{Direct}RepositoryWriter` requires using `NewWriter()` or `NewDirectWriter()` on a read-only repository and multiple simultaneous writers are supported at the same time, each writing to their own indexes and pack blobs.
`repo.Open` returns `repo.Repository` (which is also `repo.RepositoryWriter`).
* content: removed implicit flush on content manager close
* repo: added tests for WriteSession() and implicit flush behavior
* invalidate manifest manager after write session
* cli: disable maintenance in 'kopia server start'
Server will close the repository before completing.
* repo: unconditionally close RepositoryWriter in {Direct,}WriteSession
* repo: added panic in case somebody tries to create RepositoryWriter after closing repository
- used atomic to manage SharedManager.closed
* removed stale example
* linter: fixed spurious failures
Co-authored-by: Julio López <julio+gh@kasten.io>
* content: added support for cache of own writes
Thi keeps track of which blobs (n and m) have been written by the
local repository client, so that even if the storage listing
is eventually consistent (as in S3), we get somewhat sane behavior.
Note that this is still assumming read-after-create semantics, which
S3 also guarantees, otherwise it's very hard to do anything useful.
* compaction: support for compaction logs
Instead of compaction immediately deleting source index blobs, we now
write log entries (with `m` prefix) which are merged on reads
and applied only if the blob list includes all inputs and outputs, in
which case the inputs are discarded since they are known to have been
superseded by the outputs.
This addresses eventual consistency issues in stores such as S3,
which don't guarantee list-after-put or list-after-delete. With such
stores the repository is ultimately eventually consistent and there's
not much that can be done about it, unless we use second strongly
consistent storage (such as GCS) for the index only.
* content: updated list cache to cache both `n` and `m`
* repo: fixed cache clear on windows
Clearing cache requires closing repository first, as Windows is holding
the files locked.
This requires ability to close the repository twice.
* content: refactored index blob management into indexBlobManager
* testing: fixed blobtesting.Map storage to allow overwrites
* blob: added debug output String() to blob.Metadata
* testing: added indexBlobManager stress test
This works by using N parallel "actors", each repeatedly performing
operations on indexBlobManagers all sharing single eventually consistent
storage.
Each actor runs in a loop and randomly selects between:
- *reading* all contents in indexes and verifying that it includes
all contents written by the actor so far and that contents are
correctly marked as deleted
- *creating* new contents
- *deleting* one of previously-created contents (by the same actor)
- *compacting* all index files into one
The test runs on accelerated time (every read of time moves it by 0.1
seconds) and simulates several hours of running.
In case of a failure, the log should provide enough debugging
information to trace the exact sequence of events leading up to the
failure - each log line is prefixed with actorID and all storage
access is logged.
* makefile: increase test timeout
* content: fixed index blob manager race
The race is where if we delete compaction log too early, it may lead to
previously deleted contents becoming temporarily live again to an
outside observer.
Added test case that reproduces the issue, verified that it fails
without the fix and passed with one.
* testing: improvements to TestIndexBlobManagerStress test
- better logging to be able to trace the root cause in case of a failure
- prevented concurrent compaction which is unsafe:
The sequence:
1. A creates contentA1 in INDEX-1
2. B creates contentB1 in INDEX-2
3. A deletes contentA1 in INDEX-3
4. B does compaction, but is not seeing INDEX-3 (due to EC or simply
because B started read before #3 completed), so it writes
INDEX-4==merge(INDEX-1,INDEX-2)
* INDEX-4 has contentA1 as active
5. A does compaction but it's not seeing INDEX-4 yet (due to EC
or because read started before #4), so it drops contentA1, writes
INDEX-5=merge(INDEX-1,INDEX-2,INDEX-3)
* INDEX-5 does not have contentA1
7. C sees INDEX-5 and INDEX-5 and merge(INDEX-4,INDEX-5)
contains contentA1 which is wrong, because A has been deleted
(and there's no record of it anywhere in the system)
* content: when building pack index ensure index bytes are different each time by adding 32 random bytes
Support for remote content repository where all contents and
manifests are fetched over HTTP(S) instead of locally
manipulating blob storage
* server: implement content and manifest access APIs
* apiclient: moved Kopia API client to separate package
* content: exposed content.ValidatePrefix()
* manifest: added JSON serialization attributes to EntryMetadata
* repo: changed repo.Open() to return Repository instead of *DirectRepository
* repo: added apiServerRepository
* cli: added 'kopia repository connect server'
This sets up repository connection via the API server instead of
directly-manipulated storage.
* server: add support for specifying a list of usernames/password via --htpasswd-file
* tests: added API server repository E2E test
* server: only return manifests (policies and snapshots) belonging to authenticated user
* This is 99% mechanical:
Extracted repo.Repository interface that only exposes high-level object and manifest management methods, but not blob nor content management.
Renamed old *repo.Repository to *repo.DirectRepository
Reviewed codebase to only depend on repo.Repository as much as possible, but added way for low-level CLI commands to use DirectRepository.
* PR fixes
This is mostly mechanical and changes how loggers are instantiated.
Logger is now associated with a context, passed around all methods,
(most methods had ctx, but had to add it in a few missing places).
By default Kopia does not produce any logs, but it can be overridden,
either locally for a nested context, by calling
ctx = logging.WithLogger(ctx, newLoggerFunc)
To override logs globally, call logging.SetDefaultLogger(newLoggerFunc)
This refactoring allowed removing dependency from Kopia repo
and go-logging library (the CLI still uses it, though).
It is now also possible to have all test methods emit logs using
t.Logf() so that they show up in failure reports, which should make
debugging of test failures suck less.
Previously, it was possible for Flush() to miss in-flight writes,
but only when using repository manually since Uploader guarantees
there are no in-flight writes when it completes.
With this change Flush() will guarantee that any pending writes
completed before Flush() has started are guaranteed to be committed
to the repository before Flush() returns.
This was actually a regression introduced in #105.
Added regression test to prevent it from reoccurring.
Also introduced strongly typed content.ID and manifest.ID (instead of string)
This aligns identifiers across all layers of repository:
blob.ID
content.ID
object.ID
manifest.ID
This updates the terminology everywhere - blocks become blobs and
`storage.Storage` becomes `blob.Storage`.
Also introduced blob.ID which is a specialized string type, that's
different from CABS block ID.
Also renamed CLI subcommands from `kopia storage` to `kopia blob`.
While at it introduced `block.ErrBlockNotFound` and
`object.ErrObjectNotFound` that do not leak from lower layers.