mirror of
https://github.com/flatpak/flatpak.git
synced 2026-03-28 20:02:36 -04:00
This is an optimized version of ostree_repo_prune() specialized for archive mode repos. It is faster and uses less memory so that we can prune larger repos (like flathub) in a realistic timeframe. The primary reason it is faster is that it creates and uses a `.commitmeta2` file for each commit, containing information about what objects are reachable from that commit. This means incremental prunes need only traverse over newly created commits. Secondly, it uses the variant parser compiled accessors for the various GVariants that are involved in the prune which is quite a bit faster, especially if the repo is very large. It also merges the scan-for-all-objects and prune-unreachable objects phases, which means that we don't have to allocate a hashtable for all the objects in the entire repo saving a lot of memory. To save memory the hashtable of reachable objects, which can be quite big on a big repo, points to a custom, very compact format for object names. Additionally it does the scanning for reachable objects twice, first with a shared lock and then again (if anything changed) it with an exclusive lock. This allows us to avoid using an exclusive lock during the slowest part of the prune. Unfortunately there are currently no public APIs for the ostree repo locks. We really need to take an exclusive lock during the whole prune or we parallel modifications (say a commit) might get their newly written objects deleted. To work around this we have a minimal custom implementation of an exclusive lock. Once the public API is available we can start using that. I created a repo with a lot of small commits to test this. It has 9M, and pruning with depth=10 deletes 2M of them. The original performance looks like: Finding reachable objects: 287 seconds Pruning unreachable: 69 seconds Just using the pregenerated reachable data: Finding reachable objects: 15 seconds Pruning unreachable: 69 seconds The final optimized prune (using pregenerated data): Finding reachable objects: 12 seconds Pruning unreachable: 51 seconds The above are with the page caches cleaned, on a second run the performance increase is even more noticeable. As a comparison to the above, finding the reachable objects in the actual flathub repo took 22 hours, but with the pregenerated reachable data only 39 minutes.
36 lines
1.3 KiB
C
36 lines
1.3 KiB
C
/*
|
|
* Copyright © 2021 Red Hat, Inc
|
|
*
|
|
* This program is free software; you can redistribute it and/or
|
|
* modify it under the terms of the GNU Lesser General Public
|
|
* License as published by the Free Software Foundation; either
|
|
* version 2.1 of the License, or (at your option) any later version.
|
|
*
|
|
* This library is distributed in the hope that it will be useful,
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
|
* Lesser General Public License for more details.
|
|
*
|
|
* You should have received a copy of the GNU Lesser General Public
|
|
* License along with this library. If not, see <http://www.gnu.org/licenses/>.
|
|
*
|
|
* Authors:
|
|
* Alexander Larsson <alexl@redhat.com>
|
|
*/
|
|
|
|
#ifndef __FLATPAK_PRUNE_H__
|
|
#define __FLATPAK_PRUNE_H__
|
|
|
|
#include "flatpak-utils-private.h"
|
|
|
|
gboolean flatpak_repo_prune (OstreeRepo *repo,
|
|
int depth,
|
|
gboolean dry_run,
|
|
int *out_objects_total,
|
|
int *out_objects_pruned,
|
|
guint64 *out_pruned_object_size_total,
|
|
GCancellable *cancellable,
|
|
GError **error);
|
|
|
|
#endif /* __FLATPAK_PRUNE_H__ */
|