Files
flatpak/app
Alexander Larsson 0cd4656ea1 Add (and use) custom, high-perfomance prune implementation
This is an optimized version of ostree_repo_prune() specialized for
archive mode repos. It is faster and uses less memory so that we can
prune larger repos (like flathub) in a realistic timeframe.

The primary reason it is faster is that it creates and uses a
`.commitmeta2` file for each commit, containing information about what
objects are reachable from that commit. This means incremental prunes
need only traverse over newly created commits.

Secondly, it uses the variant parser compiled accessors for the
various GVariants that are involved in the prune which is quite a bit
faster, especially if the repo is very large.

It also merges the scan-for-all-objects and prune-unreachable objects
phases, which means that we don't have to allocate a hashtable for
all the objects in the entire repo saving a lot of memory.

To save memory the hashtable of reachable objects, which can be quite
big on a big repo, points to a custom, very compact format for object
names.

Additionally it does the scanning for reachable objects twice, first
with a shared lock and then again (if anything changed) it with an
exclusive lock. This allows us to avoid using an exclusive lock during
the slowest part of the prune.

Unfortunately there are currently no public APIs for the ostree repo
locks. We really need to take an exclusive lock during the whole prune
or we parallel modifications (say a commit) might get their newly
written objects deleted. To work around this we have a minimal custom
implementation of an exclusive lock. Once the public API is available
we can start using that.

I created a repo with a lot of small commits to test this.  It has 9M,
and pruning with depth=10 deletes 2M of them.

The original performance looks like:

 Finding reachable objects: 287 seconds
 Pruning unreachable: 69 seconds

Just using the pregenerated reachable data:

 Finding reachable objects: 15 seconds
 Pruning unreachable: 69 seconds

The final optimized prune (using pregenerated data):

 Finding reachable objects: 12 seconds
 Pruning unreachable: 51 seconds

The above are with the page caches cleaned, on a second run the performance
increase is even more noticeable.

As a comparison to the above, finding the reachable objects in the
actual flathub repo took 22 hours, but with the pregenerated reachable data
only 39 minutes.
2021-04-26 10:30:14 +02:00
..
2019-02-25 18:12:30 +00:00
2021-03-10 10:33:51 +01:00