* content: fixed time-based auto-flush behavior to behave like Flush()
Previously it would sometimes be possible for a content whose write
started before time-based flush to finish writing afterwards (and it
would be included in the new index).
Refactored the code so that time-based flush happens before WriteContent
write and behaves exactly the same was as real Flush() so all writes
started before it will be awaited during the flush.
Also previous regression test was incorrect since it was mocking the
wrong blob method.
* content: refactored index blob manager crypto to separate file
This will be reused for encrypting session info.
* content: added support for session markers
Session marker (`s` blob) is written BEFORE the first data blob
(`p` or `q`) that belongs to new index segment (`n` is written).
Session marker is removed AFTER the index blob (`n`) has been written.
All pack and index blobs belonging to a session will have the session
ID as its suffix, so that if a reader can see `s<sessionID>` blob, they
will ignore any `p` and `q` blobs with the same suffix.
* maintenance: ignore blobs belonging to active sessions when running blob garbage collection
* cli: added 'sessions list' for listing active sessions
* content: added retrying writing previously failed blobs before writing new one
* object: refactored Open() and VerifyObject() to be stateless
(no code movement yet to facilitate review)
* mechanical: moved function more appropriate files
* object: remove object manager tracing which was unused
* manifest: refactored management of committed manifests into separate component
This is very similar to content.CommittedReadManager and allows
multiple writes to coexist and share the underlying committed manifest
manager.
Changed compaction to only include previously-committed manifests and
not any pending ones.
Added locking assertions for extra safety.
* Improved .kopiaignore pattern matching
.kopiaignore pattern matching now (hopefully) conforms to the .gitignore specification (https://git-scm.com/docs/gitignore)
Replaced old package "ignore" with a newly written "wcmatch" that manages the globbing. This should support all the patterns that .gitignore supports.
Some changes in ignorefs that dealt with how the patterns were matched.
This fixes#571
* Fixed invalid matching of non-rooted patterns that contained a slash.
If a pattern contains a slash in the middle of the pattern this should only match relative to the .gitignore file, i.e. the same as if it started with a '/' according to the .gitignore spec.
Example:
foo/bar should match "/foo/bar", but not "/other/foo/bar".
whereas
"bar" matches both "/bar" and "/foo/bar"
* Uncommented previously failing tests.
* Fixed problem with matching "nested" .kopiaignore files.
Ignore-patterns must be applied from the root .kopiaignore down the hierarchy, so that an ignore file in a subdirectory can negate a pattern from a parent directory.
* Uncommented tests that should now work.
* Created end-to-end tests verifying .kopiaignore behavior.
This is related to #571 and #773, but provided as a separate PR to include tests that did not work before PR #773.
* Commented failing tests.
These tests will be re-enabled when #773 is done.
* Added additional commented tests of .kopiaignore
These will be uncommented in #773.
* trivial: move CachingOptions out of content.Manager, where it's not needed
* trivial: removed newManagerWithOptions which was the same as NewManager
also moved one-time initialization to newReadManager()
* content: introduced ContentReadManager
This only introduces new type and reassigns methods
* content: moved all CommittedReadManager methods to one file
* content: more code movement
* content: refactored read manager setup
* content: refactored test hook that allowed cleaner passing of custom own writes cache
* testing: prevented spurious test flakes caused by kopia subprocesses messing with stderr
This was not causing actual failures, but misreporting error messages.
* testing: ensure random names are always unique by adding a counter
* performance: improve performance of fragmented index lookups
In a typical repository there will be many small indexes and few large
ones. If the number gets above ~100 or so, things get very slow.
It helps to pre-merge very small indexes in memory to reduce the number
of binary searches and mmaped IO to perform. In some extreme cases
where we have many uncompacted index segments (each with just 1-2
entries), the savings are as quite dramatic.
In one case with >100 index segments the time to run
`kopia snapshot verify` went from 10m to 0.5m after this change.
In another well-maintained repository with 1.2M contents and
about 25 segments, the time to run `kopia snapshot verify` went from
48s to 35s.
Co-authored-by: Julio López <julio+gh@kasten.io>
This also fixed a test bug where the test was incorrectly passing
password via environment variable and it was (incorrectly) expected
to be ignored.
Password is determined in the following order:
- flag/environment variable (highest priority)
- persistent storage
- asking user (lowest priority)
* linter: upgraded to 1.33, disabled some linters
* lint: fixed 'errorlint' errors
This ensures that all error comparisons use errors.Is() or errors.As().
We will be wrapping more errors going forward so it's important that
error checks are not strict everywhere.
Verified that there are no exceptions for errorlint linter which
guarantees that.
* lint: fixed or suppressed wrapcheck errors
* lint: nolintlint and misc cleanups
Co-authored-by: Julio López <julio+gh@kasten.io>
* When running against direct repository, it will verify that all
backing blobs exist based on results of listing.
* Deprecated annoying --all-sources flag which is now default if no
sources are provided.
This can be specified at `repo create` or `repo connect` to enable
actions. By default actions are disabled to avoid security risks
associated with executing code.
Alternatively during `snapshot create` one can specify
`--force-enable-actions` or `--force-disable-actions`
* policy: add actions
* fs: added LocalFilesystemPath() which can optionally return local filesystem
path (if entry is local)
* cli: added support for setting policy actions
* upload: support for executing actions before/after folder (non-inheritable)
and before/after snapshots (inheritable)
* testing: end-to-end test for actions
* additional tests for actions with embedded scripts
* cli: split command_policy_set.go by individual areas
* cli: refactored 'policy set' implementation to reuse helpers
* use defined const instead of literal
Co-authored-by: Julio López <julio+gh@kasten.io>
Remove unnecessary intermediate variables.
Send SIGTERM instead of SIGKILL to terminate child kopia server process.
Set Pdeathsig on Linux for child kopia server process.
Trivial: reduce scope of hostFioDataPathStr variable.
Trivial: rename local variable.
Trivial: Use log.Fatalln instead of log + exit(1).
Improve error message in robustness test to tell apart failure cause.
Both source and destination can be specified using user@host,
@host or user@host:/path where destination values override the
corresponding parts of the source, so both targeted
and mass copying is supported.
Supported combinations are:
Source: Destination Behavior
---------------------------------------------------
@host1 @host2 copy snapshots from all users of host1
user1@host1 @host2 copy all snapshots to user1@host2
user1@host1 user2@host2 copy all snapshots to user2@host2
user1@host1:/path1 @host2 copy to user1@host2:/path1
user1@host1:/path1 user2@host2 copy to user2@host2:/path1
user1@host1:/path1 user2@host2:/path2 copy snapshots from single path
When --move is specified, the matching source snapshots are also deleted.
* cli: upgraded kingpin to latest version (not tagged)
This allows using `EnableFileExpansion` to disable treating
arguments prefixed with "@" as file includes.
* Robustness engine actions with stats and logging
- Add actions to robustness engine
- Actions wrap other functional behavior and serve as a common interface for collecting stats
- Add stats for the engine, both per run and cumulative over time
- Add a log for actions that the engine has executed
- Add recovery logic to re-sync snapshot metadata after a possible failed engine run (e.g. if metadata wasn't properly persisted).
Current built-in actions:
- snapshot root directory
- restore random snapshot ID into a target restore path
- delete a random snapshot ID
- run GC
- write random files to the local data directory
- delete a random subdirectory under the local data directory
- delete files in a directory
- restore a snapshot ID into the local data directory
Actions are executed according to a set of options, which dictate the relative probabilities of picking a given action, along with ranges for action-specific parameters that can be randomized.
The new files policy oneFileSystem ignores files that are mounted to
other filesystems similarly to tar's --one-file-system switch. For
example, if this is enabled, backing up / should now automatically
ignore /dev, /proc, etc, so the directory entries themselves don't
appear in the backup. The value of the policy is 'false' by default.
This is implemented by adding a non-windows-field Device (of type
DeviceInfo, reflecting the implementation of Owner) to the Entry
interface. DeviceInfo holds the dev and rdev acquired with stat (same
way as with Owner), but in addition to that it also holds the same
values for the parent directory. It would seem that doing this in some
other way, ie. in ReadDir, would require modifying the ReadDir
interface which seems a too large modification for a feature this
small.
This change introduces a duplication of 'stat' call to the files, as
the Owner feature already does a separate call. I doubt the
performance implications are noticeable, though with some refactoring
both Owner and Device fields could be filled in in one go.
Filling in the field has been placed in fs/localfs/localfs.go where
entryFromChildFileInfo has acquired a third parameter giving the the
parent entry. From that information the Device of the parent is
retrieved, to be passed off to platformSpecificDeviceInfo which does
the rest of the paperwork. Other fs implementations just put in the
default values.
The Dev and Rdev fields returned by the 'stat' call have different
sizes on different platforms, but for convenience they are internally
handled the same. The conversion is done with local_fs_32bit.go and
local_fs_64bit.go which are conditionally compiled on different
platforms.
Finally the actual check of the condition is in ignorefs.go function
shouldIncludeByDevice which is analoguous to the other similarly named
functions.
Co-authored-by: Erkki Seppälä <flux@inside.org>