I completely ignored the fact that symlinks can point to directories.
this then would make our checksum logic fall over because you can't
checksum a dirent. instead introduce completely bespoke handling for
symlinks. it's increeeedibly similar to files but ever so slightly
different so we have type assurances.
may be worth creating an Analyzer interface in the future so we don't
have to dupe things so much
there is a flatpak and technically that works without a system libvirt
it's hard to tell right now if it is working though because we do
have a system libvirt :D
after some stunts with dump.erofs I am led to believe that the remaining
delta chunks are in fact symlink (metadata?)
specifically I am seeing a non-matching chunk
`░ -- start: 3626777409 len: 82685`
which seems to cover the following extent in erofs
```
Path :
/usr/share/factory/var/lib/flatpak/runtime/org.kde.Platform/x86_64/6.9/f930fae18cfc829f51db18b9324905a3bebee0ec7e9d4d62afbb17f696fb20d0/files/share/icons/breeze-dark/status/22/rotation-locked-landscape-symbolic.svg
Size: 29 On-disk size: 29 symlink file
NID: 113336534 Links: 1 Layout: 2 Compression ratio: 100.00%
Inode size: 64 Xattr size: 0
Uid: 0 Gid: 0 Access: 0777/rwxrwxrwx
Timestamp: 2025-08-26 12:33:18.806496943
Ext: logical offset | length : physical offset | length
0: 0.. 29 | 29 : 3626769152..3626769181 | 29
```
the trouble is that because the chunk is so large it's hard to tell what
the actual change is that causes the delta. considering the mtime
definitely is the build time it is my only guess right now
if the dir is newer than the files then we still want to force it to a
consistent value of the files
most notably this should prevent a whole host of dirs from having an
mtime that is the package unpack time, which is obviously changing
between builds
with file mtimes stabilized, we now have dirs lighting up like a
christmas tree in my diff scripts. give them a stable mtime to get
consistency between builds.
the idea here is that if we set the mtime of all dirs to their latest
content's mtime we'll implicitly stabilize the dirs through stabilizing
the files
somewhat unfortunately we need to do this in a single thread because
otherwise we'd have to segment deep trees and I really don't want to
venture there for such an otherwise simple program
a future option might be to also put dirs in our json but realistically
that only makes a difference for empty dirs (since they have no content
from which to derive the mtime). so let's see where we get with this. we
can always add dir records in the json later
files in var/lib/pacman/local are super tiny and change between all our
builds, causing many tiny deltas. we don't like many tiny deltas in our
delta downloading because it means many tiny downloads. so remove the
files to get a smaller overall delta between image versions
this is a bit of a shot in the dark, but I believe we may have
unnecessary delta in our images caused by the rebuilding of software on
a daily basis. this would result in mtimes changing when the files
actually do not.
a tiny mtimer tool is meant to work around that by consuming an input
json file of mtimes+checksums and if mtimes change it will checksum the
affected file to verify it actually has changed in content as well.
assuming reproducible builds this should result in far less delta in the
erofs and by extension the delta download
this should make things more amazing. from my testing it looks like
aligning the sizes improves compression and improves caibx generation by
aligning chunk boundaries better.
in a test scenario of adding a single 128M random data file to /usr/lib
this brought the fragmentation from a couple thousand segments down to 8
(of which 5 are in the superblock and the new file appears as large
contiguous chunk delta). the actual download size is a 135M delta
This gets us `lsusb`, which is missing next to all the other `ls[thing]`
tools already pre-installed.
This will increase the base image size by 375.5 KB, and pull in no new
dependencies.
the previous code would end up dropping the protected versions from the
shasums, we'll want to keep them there as well.
to achieve this we now collect all protected releases and then append
them to the keep list once the keep list has been pruned. seems the most
reliable way of doing this