Files
flatpak/common/flatpak-prune-private.h
Alexander Larsson 0cd4656ea1 Add (and use) custom, high-perfomance prune implementation
This is an optimized version of ostree_repo_prune() specialized for
archive mode repos. It is faster and uses less memory so that we can
prune larger repos (like flathub) in a realistic timeframe.

The primary reason it is faster is that it creates and uses a
`.commitmeta2` file for each commit, containing information about what
objects are reachable from that commit. This means incremental prunes
need only traverse over newly created commits.

Secondly, it uses the variant parser compiled accessors for the
various GVariants that are involved in the prune which is quite a bit
faster, especially if the repo is very large.

It also merges the scan-for-all-objects and prune-unreachable objects
phases, which means that we don't have to allocate a hashtable for
all the objects in the entire repo saving a lot of memory.

To save memory the hashtable of reachable objects, which can be quite
big on a big repo, points to a custom, very compact format for object
names.

Additionally it does the scanning for reachable objects twice, first
with a shared lock and then again (if anything changed) it with an
exclusive lock. This allows us to avoid using an exclusive lock during
the slowest part of the prune.

Unfortunately there are currently no public APIs for the ostree repo
locks. We really need to take an exclusive lock during the whole prune
or we parallel modifications (say a commit) might get their newly
written objects deleted. To work around this we have a minimal custom
implementation of an exclusive lock. Once the public API is available
we can start using that.

I created a repo with a lot of small commits to test this.  It has 9M,
and pruning with depth=10 deletes 2M of them.

The original performance looks like:

 Finding reachable objects: 287 seconds
 Pruning unreachable: 69 seconds

Just using the pregenerated reachable data:

 Finding reachable objects: 15 seconds
 Pruning unreachable: 69 seconds

The final optimized prune (using pregenerated data):

 Finding reachable objects: 12 seconds
 Pruning unreachable: 51 seconds

The above are with the page caches cleaned, on a second run the performance
increase is even more noticeable.

As a comparison to the above, finding the reachable objects in the
actual flathub repo took 22 hours, but with the pregenerated reachable data
only 39 minutes.
2021-04-26 10:30:14 +02:00

36 lines
1.3 KiB
C

/*
* Copyright © 2021 Red Hat, Inc
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2.1 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library. If not, see <http://www.gnu.org/licenses/>.
*
* Authors:
* Alexander Larsson <alexl@redhat.com>
*/
#ifndef __FLATPAK_PRUNE_H__
#define __FLATPAK_PRUNE_H__
#include "flatpak-utils-private.h"
gboolean flatpak_repo_prune (OstreeRepo *repo,
int depth,
gboolean dry_run,
int *out_objects_total,
int *out_objects_pruned,
guint64 *out_pruned_object_size_total,
GCancellable *cancellable,
GError **error);
#endif /* __FLATPAK_PRUNE_H__ */