kopia

mirror of https://github.com/kopia/kopia.git synced 2026-05-10 15:54:20 -04:00

Author	SHA1	Message	Date
Jarek Kowalski	40acf238f3	Fixed arm and arm64 build. (#506 ) * fixed a number of cases where misaligned data was causing panics on armv7 (but not armv8) * travis: enable arm64 * test: reduce compressed data sizes when running on arm * arm: wait longer for snapshots	2020-07-30 17:31:28 -07:00
Jarek Kowalski	7e9ce61f9e	server: automatically flush the repository after setting or deleting a policy (#489 ) Fixes #479	2020-07-20 20:59:21 -07:00
Jarek Kowalski	64a6cb42dc	parallelwork: fixed error handling, which caused parallel work to never finish on any error	2020-06-24 08:39:56 -07:00
Jarek Kowalski	79757672ca	server: implemented 'flush' and 'refresh' API Added test that verifies that when client performs Flush (which happens at the end of each snapshot and when repository is closed), the server writes new blobs to the storage. Fixes #464	2020-06-07 19:38:13 -07:00
Pavan Navarathna	c13b5f820f	Remove extra whitespaces and fix minor typos (#460 )	2020-06-01 13:40:57 -07:00
Jarek Kowalski	960c33475e	maintenance: disabled automatic compaction on repository opening instead moved to run as part of maintenance ('kopia maintenance run') added 'kopia maintenance run --force' flag which runs maintenance even if not owned	2020-06-01 00:57:32 -07:00
Jarek Kowalski	d68273a576	Improvements for dealing with eventually-consistent stores (S3) (#437 ) * content: added support for cache of own writes Thi keeps track of which blobs (n and m) have been written by the local repository client, so that even if the storage listing is eventually consistent (as in S3), we get somewhat sane behavior. Note that this is still assumming read-after-create semantics, which S3 also guarantees, otherwise it's very hard to do anything useful. * compaction: support for compaction logs Instead of compaction immediately deleting source index blobs, we now write log entries (with `m` prefix) which are merged on reads and applied only if the blob list includes all inputs and outputs, in which case the inputs are discarded since they are known to have been superseded by the outputs. This addresses eventual consistency issues in stores such as S3, which don't guarantee list-after-put or list-after-delete. With such stores the repository is ultimately eventually consistent and there's not much that can be done about it, unless we use second strongly consistent storage (such as GCS) for the index only. * content: updated list cache to cache both `n` and `m` * repo: fixed cache clear on windows Clearing cache requires closing repository first, as Windows is holding the files locked. This requires ability to close the repository twice. * content: refactored index blob management into indexBlobManager * testing: fixed blobtesting.Map storage to allow overwrites * blob: added debug output String() to blob.Metadata * testing: added indexBlobManager stress test This works by using N parallel "actors", each repeatedly performing operations on indexBlobManagers all sharing single eventually consistent storage. Each actor runs in a loop and randomly selects between: - reading all contents in indexes and verifying that it includes all contents written by the actor so far and that contents are correctly marked as deleted - creating new contents - deleting one of previously-created contents (by the same actor) - compacting all index files into one The test runs on accelerated time (every read of time moves it by 0.1 seconds) and simulates several hours of running. In case of a failure, the log should provide enough debugging information to trace the exact sequence of events leading up to the failure - each log line is prefixed with actorID and all storage access is logged. * makefile: increase test timeout * content: fixed index blob manager race The race is where if we delete compaction log too early, it may lead to previously deleted contents becoming temporarily live again to an outside observer. Added test case that reproduces the issue, verified that it fails without the fix and passed with one. * testing: improvements to TestIndexBlobManagerStress test - better logging to be able to trace the root cause in case of a failure - prevented concurrent compaction which is unsafe: The sequence: 1. A creates contentA1 in INDEX-1 2. B creates contentB1 in INDEX-2 3. A deletes contentA1 in INDEX-3 4. B does compaction, but is not seeing INDEX-3 (due to EC or simply because B started read before #3 completed), so it writes INDEX-4==merge(INDEX-1,INDEX-2) * INDEX-4 has contentA1 as active 5. A does compaction but it's not seeing INDEX-4 yet (due to EC or because read started before #4), so it drops contentA1, writes INDEX-5=merge(INDEX-1,INDEX-2,INDEX-3) * INDEX-5 does not have contentA1 7. C sees INDEX-5 and INDEX-5 and merge(INDEX-4,INDEX-5) contains contentA1 which is wrong, because A has been deleted (and there's no record of it anywhere in the system) * content: when building pack index ensure index bytes are different each time by adding 32 random bytes	2020-05-31 17:11:20 -07:00
Jarek Kowalski	8c4fb53c96	blob: support for GetMetadata() to get server-side timestamp and blob length (#440 )	2020-05-18 11:06:34 -07:00
Jarek Kowalski	d657415817	testing: added blob.Storage wrapper that simulates eventual consistency (#434 ) This is done by introducing N unsynchronized caches, which simulate what frontend of a cloud storage system might do, that causes eventual consistency behavior.	2020-05-09 12:19:32 -07:00
Jarek Kowalski	be4b897579	Support for remote repository (#427 ) Support for remote content repository where all contents and manifests are fetched over HTTP(S) instead of locally manipulating blob storage * server: implement content and manifest access APIs * apiclient: moved Kopia API client to separate package * content: exposed content.ValidatePrefix() * manifest: added JSON serialization attributes to EntryMetadata * repo: changed repo.Open() to return Repository instead of DirectRepository repo: added apiServerRepository * cli: added 'kopia repository connect server' This sets up repository connection via the API server instead of directly-manipulated storage. * server: add support for specifying a list of usernames/password via --htpasswd-file * tests: added API server repository E2E test * server: only return manifests (policies and snapshots) belonging to authenticated user	2020-05-02 21:41:49 -07:00
Jarek Kowalski	1377d057e4	Maintenance changes (#423 ) * maintenance: encrypt maintenance schedule block * maintenance: created snapshotmaintenance package that wraps maintenance and performs snapshot GC + regular maintenance in one shot, used in CLI and server * PR feedback.	2020-05-02 20:40:16 -07:00
Jarek Kowalski	4b4628a21e	Repository maintenance support (#411 ) Maintenance: support for automatic GC Moved maintenance algorithms from 'cli' to 'repo/maintenance' package Added support for CLI commands: kopia gc - performs quick maintenance kopia gc --full- perform full maintenance Full maintenance performs snapshot gc, but it's not safe to do this automatically possibly in parallel to snapshots being taken. This will be addressed ~0.7 timeframe.	2020-04-14 00:11:41 -07:00
Jarek Kowalski	1f1682b2cc	Snapshot checkpointing (#410 ) * snapshot: support for periodic checkpointing of snapshots in progress For each snapshot that takes longer than 45 minutes, we trigger internal cancellation, save the manifest and restart the snapshot at which point all files will be cached. This helps ensure the property that no file or directory objects in the repository remain unreachable from a snapshot root for more than one hour, which is important from GC perspective. * nit: unified spelling 'cancelled' => 'canceled'	2020-04-07 17:54:21 -07:00
Jarek Kowalski	057c2789d8	Kopia UI: support for multiple repositories + portability (#398 ) * server: when serving HTML UI, prefix the title with string from KOPIA_UI_TITLE_PREFIX envar * kopia-ui: support for multiple repositories + portability This is a major rewrite of the app/ codebase which changes how configuration for repositories is maintained and how it flows through the component hierarchy. Portable mode is enabled by creating 'repositories' subdirectory before launching the app. on macOS: <parent>/KopiaUI.app <parent>/repositories/ On Windows, option #1 - nested directory <parent>\KopiaUI.exe <parent>\repositories\ On Windows, option #2 - parallel directory <parent>\some-dir\KopiaUI.exe <parent>\repositories\ In portable mode, repositories will have 'cache' and 'logs' nested in it.	2020-04-04 17:18:37 -07:00
Jarek Kowalski	6cb9b8fa4f	repo: refactored public API (#318 ) * This is 99% mechanical: Extracted repo.Repository interface that only exposes high-level object and manifest management methods, but not blob nor content management. Renamed old repo.Repository to repo.DirectRepository Reviewed codebase to only depend on repo.Repository as much as possible, but added way for low-level CLI commands to use DirectRepository. * PR fixes	2020-03-26 08:04:01 -07:00
Jarek Kowalski	10bb492926	repo: deprecated NONE algorithm, will not be available for new repositories (#395 ) * repo: deprecated NONE algorithm, will not be available for new repositories Co-authored-by: Julio López <julio+gh@kasten.io>	2020-03-24 23:19:20 -07:00
Jarek Kowalski	60977812f0	Support for gather writes (#373 ) , where blob.Storage.PutBlob gets a list of slices and writes them sequentially * performance: added gather.Bytes and gather.WriteBuffer They are similar to bytes.Buffer but instead of managing a single byte slice, they maintain a list of slices that and when they run out of space they allocate new fixed-size slice from a free list. This helps keep memory allocations completely under control regardless of the size of data written. * switch from byte slices and bytes.Buffer to gather.Bytes. This is mostly mechanical, the only cases where it's not involve blob storage providers, where we leverage the fact that we don't need to ever concatenate the slices into one and instead we can do gather writes. * PR feedback	2020-03-24 15:05:52 -07:00
Jarek Kowalski	b08d394864	policy: deduplicate multiple policies for the same source in policy manager, fixes #391	2020-03-23 23:52:23 -07:00
Jarek Kowalski	9b68a631e6	Highlight snapshot errors in the UI and CLI (#376 ) * upload: exposed numFailed and failedEntries on directory summary * cli: better present snapshot errors * htmlui: display snapshot errors	2020-03-22 14:18:47 -07:00
Jarek Kowalski	239d809075	performance: introduced buf.Pool which helps reuse memory buffers (#345 ) * performance: added buf.Pool which can be used to manage ephemeral buffers for encryption and compression * repo: switched object writer to buf.Pool * content: switched encryption to use buf.Pool * object: switched compression to use buf.Pool * testing: added missing content manager Close()	2020-03-18 20:42:16 -07:00
Jarek Kowalski	c9877bf130	performance: refactored content manager to avoid copying data Previously we would store special field Payload for contents that were added but never flushed to the store and it was not encrypted. This required special handling different for pending vs flushed contents. This change maintains pending pack buffer ready to be flushed and appends encrypted contents to it, which avoids data copying. The buffers are pooled to avoid allocations.	2020-03-17 18:07:10 -07:00
Jarek Kowalski	e80f5536c3	performance: plumbed through output buffer to encryption and hashing,… (#333 ) * performance: plumbed through output buffer to encryption and hashing, so that the caller can pre-allocate/reuse it * testing: fixed how we do comparison of byte slices to account for possible nils, which can be returned from encryption	2020-03-12 08:27:44 -07:00
Julio López	89c0c6bac4	Refactor CLI stats (#341 ) * Helper package internal/stats * Use internal/stats for blob gc stats * Use internal/stats for content list stats * Refactor gc stats - Leverages internal/stats package - Return GC stats - nit: error message formatting - Refactor block in gc.Run. Simplifies and reduces a level of indentation	2020-03-11 22:16:07 -07:00
Jarek Kowalski	514df69afa	performance: added wrapper around io.Copy() this pools copy buffers so they can be reused instead of throwing away after each io.Copy()	2020-03-10 21:52:30 -07:00
Julio López	d9ce3d0ad6	Inject time in Kopia components (#314 ) Motivation: Allow time injection for (unit) tests, to more easily test and verify time-dependent invariants. Add time injection support for: * repo.Manager * manifest.Manager * snapshot.Uploader Then, wire up to these components. The content.Manager already had support for time injection, but was not wired up from the time function passed to repo creation. Add an internal/faketime package for testing. Mainly code movement from testing code in the repo/content package. Motivation: make it available to other packages outside content Also, add simple tests for faketime functions.	2020-03-10 00:42:10 -07:00
Jarek Kowalski	5f96b0240a	testing: added retry helper	2020-03-09 21:34:10 -07:00
Julio López	88ce341a40	Trivial cleanup for internal diff (#316 ) * Prefer filepath.Join * Remove downloadFile's receiver parameter	2020-03-09 18:18:42 -07:00
Jarek Kowalski	ddd267accc	crypto: deprecated crypto algorithms and replaced with better alternatives New ciphers are using authenticated encryption with associated data (AEAD) and per-content key derived using HMAC-SHA256: * AES256-GCM-HMAC-SHA256 * CHACHA20-POLY1305-HMAC-SHA256 They support content IDs of arbitrary length and are quite fast: On my 2019 MBP: - BLAKE2B-256 + AES256-GCM-HMAC-SHA256 - 648.7 MiB / second - BLAKE2B-256 + CHACHA20-POLY1305-HMAC-SHA256 - 597.1 MiB / second - HMAC-SHA256 + AES256-GCM-HMAC-SHA256 351 MiB / second - HMAC-SHA256 + CHACHA20-POLY1305-HMAC-SHA256 316.2 MiB / second Previous ciphers had several subtle issues: * SALSA20 encryption, used weak nonce (64 bit prefix of content ID), which means that for any two contents, whose IDs that have the same 64-bit prefix, their plaintext can be decoded from the ciphertext alone. * AES-{128,192,256}-CTR were not authenticated, so we were required to hash plaintext after decryption to validate. This is not recommended due to possibility of subtle timing attacks if an attacker controls the ciphertext. * SALSA20-HMAC was only validating checksum and not that the ciphertext was for the correct content ID. New repositories cannot be created using deprecated ciphers, but they will still be supported for existing repositories, until at least 0.6.0. The users are encouraged to migrate to one of new ciphers when 0.5.0 is out.	2020-02-29 20:50:50 -08:00
Jarek Kowalski	d181403284	crypto: refactored encryption, hashing and splitter into separate packages (#274 ) Added some tests, deleted XSALSA20 which never worked E2E	2020-02-27 12:36:49 -08:00
Jarek Kowalski	e3854f7773	BREAKING: changed how hostname/username are handled The hostname/username are now persisted when connecting to repository in a local config file. This prevents weird behavior changes when hostname is suddenly changed, such as when moving between networks. repo.Repository will now expose Hostname/Username properties which are always guarnateed to be set, and are used throughout. Removed --hostname/--username overrides when taking snapshot et.al.	2020-02-25 20:40:23 -08:00
Jarek Kowalski	c8fcae93aa	logging: refactored logging This is mostly mechanical and changes how loggers are instantiated. Logger is now associated with a context, passed around all methods, (most methods had ctx, but had to add it in a few missing places). By default Kopia does not produce any logs, but it can be overridden, either locally for a nested context, by calling ctx = logging.WithLogger(ctx, newLoggerFunc) To override logs globally, call logging.SetDefaultLogger(newLoggerFunc) This refactoring allowed removing dependency from Kopia repo and go-logging library (the CLI still uses it, though). It is now also possible to have all test methods emit logs using t.Logf() so that they show up in failure reports, which should make debugging of test failures suck less.	2020-02-25 17:24:44 -08:00
Jarek Kowalski	897483299f	Kopia UI & CLI: support for progress indicator (#268 ) Percentage based on last-known snapshot size * server: exposed last completed snapshot size in the API * cli: added support for progress indicator (percentage based on last-known snapshot size) * htmlui: added progress indicator in the UI (percentage based on last-known snapshot size)	2020-02-24 17:55:02 -08:00
Jarek Kowalski	5412d75f79	htmlui: approaching usability by mere mortals - added ability to make new snapshots from the UI - added directory picker - hide/show macOS dock icon automatically - fixed copy/paste on Mac (apparently if you don't have 'Edit' menu in your app, copy/paste and many other shortcut keys simply don't work) - added smart time formatting ("X minutes ago", etc.) in lists using 'moment' library - added progress information to snapshots	2020-02-22 20:03:57 -08:00
Jarek Kowalski	e573548b93	server: fixed race between shutdown and syncSourcesLocked()	2020-02-22 19:27:10 -08:00
Jarek Kowalski	985fc0ad12	server: fixed /objects/ path mapping, added tests	2020-02-22 19:27:10 -08:00
Jarek Kowalski	27854d85ed	server: report local username and hostname when listing sources	2020-02-22 19:27:10 -08:00
Jarek Kowalski	3e58911cf3	tests: de-parallelized server tests	2020-02-22 19:27:10 -08:00
Jarek Kowalski	9b50a6e891	test: increased e2e test timeout Added linear retry support when waiting for snapshots	2020-02-22 19:27:10 -08:00
Jarek Kowalski	fde2f2e0e6	server: additional status code from CreateSnapshotSource, more tests	2020-02-22 19:27:10 -08:00
Jarek Kowalski	ab2c906f2c	server: implemented remaining server API methods CreateSnapshotSource API for ensuring source exists Upload - starts upload on a given source or matching sources Cancel - cancels upload on a given source or matching sources	2020-02-22 19:27:10 -08:00
Jarek Kowalski	ee88cfd229	server: switched from manual routing to github.com/gorilla/mux	2020-02-22 19:27:10 -08:00
Jarek Kowalski	8e812b76c0	blob: added retries to Filesystem provider, fixes #249 (#251 ) Wrote a test first which failed 100% on Windows. After adding retries it passed 20 times in a row, execution time is ~10s. Fixes #249	2020-02-19 13:17:47 -08:00
Jarek Kowalski	c42b5cd89f	server: API server for CRUD on individual Policies	2020-02-16 23:04:17 -08:00
Jarek Kowalski	cc5597ed6d	server: set default policy after repo creation	2020-02-16 22:43:36 -08:00
Jarek Kowalski	4cb898927c	server: new APIs and error codes to support UI flow for connecting to repository	2020-02-16 22:43:36 -08:00
Jarek Kowalski	4c35ed82b9	linter fixes	2020-02-13 17:23:50 -08:00
Jarek Kowalski	0f79279f5e	server: added support for new verbs in the API /api/v1/repo/create /api/v1/repo/connect /api/v1/repo/disconnect Refactored server code and fixed a number of outstanding robustness issues. Tweaked the API responses a bit to make more sense when consumed by the UI.	2020-02-13 17:23:50 -08:00
Jarek Kowalski	4736e9037e	revamped progress output and cleaned up logging See https://asciinema.org/a/ykx6uzEhKY3451fWEnX9nm9uo	2020-02-10 19:08:35 -08:00
Jarek Kowalski	edca1733b6	repo: moved password persistence to repository layer	2020-02-09 20:55:07 -08:00
Jarek Kowalski	29e5750686	travis: added bare-bones Windows build that does go test fixed some issues that prevented go test from passing on Windows: - webdav client used \ instead of / - need retries around mmap.Open() - paths are prefixed with C:\ on windows - time.Now() does not always move forward on Windows	2020-02-09 20:22:14 -08:00

1 2 3 4 5

237 Commits