Commit Graph

17 Commits

Author SHA1 Message Date
Jarek Kowalski
b55d5b474c refactor(repository): refactored internal index read API to reduce memory allocations (#3754)
* refactor(repository): refactored internal index read API to reduce memory allocations

* fixed stress test flake, improved debuggability

* fixed spurious checklocks failures

* post-merge fixes

* pr feedback
2024-04-12 22:59:11 -07:00
Jarek Kowalski
7ee30b76bb fix(repository): fixed handling of content.Info (#3356)
* fix(repository): fixed handling of content.Info

Previously content.Info was an interface which was implemented by:

* index.InfoStruct
* index.indexEntryInfoV1
* index.indexEntryInfoV2

The last 2 implementations were relying on memory-mapped files
which in rare cases could be closed while Kopia was still processing
them leading to #2599.

This changes fixes the bug and strictly separates content.Info (which
is now always a struct) from the other two (which were renamed as
index.InfoReader and only used inside repo/content/...).

In addition to being safer, this _should_ reduce memory allocations.

* reduce the size of content.Info with proper alignment.

* pr feedback

* renamed index.InfoStruct to index.Info
2023-10-14 10:34:15 -07:00
Jarek Kowalski
ae833bf822 refactor(repository): refactored v1 encryption overhead to be a function that's only invoked when actual V1 index is opened (#2300) 2022-08-10 05:26:51 +00:00
Jarek Kowalski
51dcaa985d chore(ci): upgraded linter to 1.48.0 (#2294)
Mechanically fixed all issues, added `lint-fix` make target.
2022-08-09 06:07:54 +00:00
Jarek Kowalski
70e24106ee refactor(general): unified logging.Logger with *zap.SugaredLogger (#2090)
- removed a bunch of hacks and should improve the logging
performance by avoiding interfaces and data translation. This will
allow using of de-sugared loggers in performance-critical
logging situations.

- this will also allow using features of ZAP more directly without
having to reimplement them.

- moved logging.Printf() to testlogging

- refactored `uitask` to store logs in a structural format and
present them as JSON only in the UI

- renamed printf_logger.go to printf.go so that fewer columns are used
in the logs
2022-06-26 05:11:52 +00:00
Jarek Kowalski
9bf9cac7fb refactor(repository): ensure we always parse content.ID and object.ID (#1960)
* refactor(repository): ensure we always parse content.ID and object.ID

This changes the types to be incompatible with string to prevent direct
conversion to and from string.

This has the additional benefit of reducing number of memory allocations
and bytes for all IDs.

content.ID went from 2 allocations to 1:
   typical case 32 characters + 16 bytes per-string overhead
   worst-case 65 characters + 16 bytes per-string overhead
   now: 34 bytes

object.ID went from 2 allocations to 1:
   typical case 32 characters + 16 bytes per-string overhead
   worst-case 65 characters + 16 bytes per-string overhead
   now: 36 bytes

* move index.{ID,IDRange} methods to separate files

* replaced index.IDFromHash with content.IDFromHash externally

* minor tweaks and additional tests

* Update repo/content/index/id_test.go

Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>

* Update repo/content/index/id_test.go

Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>

* pr feedback

* post-merge fixes

* pr feedback

* pr feedback

* fixed subtle regression in sortedContents()

This was actually not producing invalid results because of how base36
works, just not sorting as efficiently as it could.

Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
2022-05-25 14:15:56 +00:00
Jarek Kowalski
5d87d81733 chore(repository): extracted content index building and parsing into repo/content/index (#1881) 2022-04-05 18:04:50 -07:00
Jarek Kowalski
920341cb68 cache: prevent metadata cache thrashing if working set exceeds max defined size (#1557)
This is done by protecting newly added cache items from being swept for
X amount of time where X defaults to:

* `metadata` - 24 hours (new)
* `data` - 10 min (new)
* `indexes` - 1 hours (same as today)

Fixes #1540
2021-12-03 15:35:01 -08:00
Jarek Kowalski
a0cfa2556f introduced structural debug logging and optional JSON output (#1475)
* logging: added Logger.Debugw(message, key1, value1, ..., keyN, valueN)

This is based on ZAP and allows structural logs to be emitted.

* cli: added --json-log-console and --json-log-file flags

* logging: updated storage logging wrapper to use structural logging

* pr feedback
2021-11-03 21:57:37 -07:00
Jarek Kowalski
35d0f31c0d huge: replaced the use of allocated byte slices with populating gather.WriteBuffer in the repository (#1244)
This helps recycle buffers more efficiently during snapshots.
Also, improved memory tracking, enabled profiling flags and added pprof
by default.
2021-08-20 08:45:10 -07:00
Jarek Kowalski
d84c884321 Added content manager internal logging (#1116)
* logging: added logger wrappers for Broadcast and Prefix

* nit: moved max hash size to a named constant

* content: added internal logger

* content: replaced context-based logging with explicit Loggers

This will capture the logger.Logger associated with the context when
the repository is opened and will reuse it for all logs instead of
creating new logger for each log message.

The new logger will also write logs to the internal logger in addition
to writing to a log file/console.

* cli: allow decrypting all blobs whose names start with _

* maintenance: added logs cleanup

* cli: commands to view logs

* cli: log selected command on each write session
2021-06-05 08:48:43 -07:00
Jarek Kowalski
9e861c9e05 Implemented index v2 builder and parser (#1028)
* content: added GetCompressionHeaderID and GetEncryptionKeyID to content.info

Both must be zero in index v1 but will be non-zero in index v2 to
support in-content compression and key rotation in the future.

* content: cleaned up index v1 code

* content: added index v2 implementation

* content: updated index test to verify that we're able to store all supported values in all Info fields

* content: optimized sorting of content.Info by content ID using bucket sort and parallelization

For 10M contents this reduces sort time from 10s to ~2s

* content: fixed a bunch of off-by-one errors in index v2, added tests

* content: fixed test failures due to increased validation

* content: plumbed through index version (currently hardcoded to v1) in content manager
2021-05-07 09:56:27 -07:00
Jarek Kowalski
df430371b9 Refactored content.Info to be an interface and switched index parsing to be lazy (#1008) 2021-04-27 05:53:52 -07:00
Jarek Kowalski
74f926cb0d content: added content.Info.OriginalLength (#989) 2021-04-19 19:44:10 -07:00
Jarek Kowalski
b8c3ae378b testing: replaced locally-defined must() with require.NoError() (#942) 2021-04-05 09:57:50 -07:00
Jarek Kowalski
b6e68fa28a Fixed few coverage flakes (#872)
* blobtesting: coverage for GetMetadata() returning ErrNotFound
* content: additional direct coverage for diskCommittedContentIndexCache
2021-03-07 00:03:20 -08:00
Jarek Kowalski
6bb41794ee codecov: added ignore rules (#854)
* codecov: added ignore rules

* manifest: fixed flaky test coverage

* content: added direct unit tests for committed content index cache
2021-02-27 14:00:33 -08:00