Planning to use memfd_create() in flatpak and rpm-ostree, which both use
bubblewrap, and want to pass read-only data via file descriptor to the
container. Passing via `O_TMPFILE` requires `O_RDWR` (read and write),
and passing via a pipe would require buffering.
The systemd `missing.h` has grown enormously; I only cherry-picked the bits for
memfd.
This makes us more friendly to being embedded in a GObject or
the like that's fully zero-initialized, rather than relying on the special
`-1` value for the fd.
Calls to `glnx_release_lock_file()` become idempotent, so it's easy to call it
unconditionally in an object finalizer.
Particularly if `AT_FDCWD` is used, we need to open
in the target dir, otherwise we can get `EXDEV` when trying
to do the final link.
(Theoretically we can cross a mountpoint even with fd-relative
though this is a lot less likely)
Having our tests forced into a `goto out` style is seriously annoying
since we can't write tests like we write production code. Add
a macro that checks for the error being NULL.
This doesn't fully solve the problem since the test functions are
still forced into `void` returns; at some point I may extend
GLib to have `g_test_add_err_func()`.
We have a use case in libostree's staging dirs where we try to reuse
them across multiple ostree txns, but we want the fd-relative bits
here.
Extend the tmpdir API to make deletion optional. While here, also extend the API
to support checking for errors when deleting for projects like libostree that
want to do so consistently.
Also while here, add a change to set the fd to `-1` after clearing to be extra
defensive.
I was working on rpm-ostree unified core, and hit the fact that
`glnx_file_copy_at()` had the same bug with `fsetxattr()` and files whose mode
is <= `0400` (e.g. `000` in the case of `/etc/shadow`) that libostree did a
while ago. Basically, Linux currently allows `write()` on non-writable open files
but not `fsetxattr()`. This situation is masked for privileged (i.e.
`CAP_DAC_OVERRIDE`) code.
Looking at this, I think it's cleaner to convert to `O_TMPFILE` here,
since that code already handles setting the tmpfile to mode `0600`. Now,
this *is* a behavior change in the corner case of existing files which
are symbolic links. Previously we'd do an `open(O_TRUNC)` which would follow
the link.
But in the big picture, I think the use cases for `open(O_TRUNC)` are really
rare - I audited all callers of this in ostree/rpm-ostree/flatpak, and all of
them will be fine with this behavior change. For example, the ostree `/etc`
merge code already explicitly unlinks the target beforehand. Other cases like
supporting `repo/pubring.gpg` in an ostree repo being a symlink...eh, just no.
Making this change allows us to convert to new style, and brings all of the
general benefits of using `O_TMPFILE` too.
While reading a strace I noticed a double close in the tests; this was because
we were missing an assignment to `-1` in the tests. However, let's make
supporting this clearer by explicitly supporting the fd being `-1` while still
setting the `initialized` variable to `FALSE`. We also add the `EBADF` assertion
checking.
I noticed while reading the manpage for `linkat()` that `O_TMPFILE`
supports `O_EXCL` to mean exactly what we're doing with the anonymous
tmpfile API.
Change the code to start using it; this required refactoring the internals since
we had a check to be sure the caller wasn't passing `O_EXCL` for the
non-anonymous path which we want to keep.
Presumably the storage system could do smarter things if it knows a file will
always be anonymous, e.g. it doesn't need to journal its data.
This is a very common pattern in both ostree/rpm-ostree. Make a better API for
this. I thought a lot about simply zeroing out `struct stat` but that feels
dangerous; none of the values have seem obviously `cannot be zero`.
Basically all of the ostree/rpm-ostree callers want to both create and open, so
let's merge `glnx_mkdtempat()` and `glnx_mkdtempat_open()`.
Second, all of them want to do `glnx_shutil_rm_rf_at()` on cleanup, so we do the
same thing we did with `GLnxTmpfile` and create `GLnxTmpDir` that has a cleanup
attribute.
The cleanup this results in for rpm-ostree is pretty substantial.
FICLONE is the new alias for the formerly btrfs-specific ioctl; XFS
has experimental patches to support it.
Further, we should use copy_file_range() for the case where we're only doing a
limited copy. Both NFS and XFS (with reflink enabled) understand it.
Part of the reason I'm doing this is so that ostree's `/etc` merge will start
using XFS reflinks. But another major reason is to take the next step after and
copy this code into GLib as well, so that all of the general GLib users will
benefit; e.g. Nautilus will transparently do server copy offloads with NFS home
directories.
See also this coreutils thread about `copy_file_range()`:
<https://debbugs.gnu.org/cgi/bugreport.cgi?bug=24399>. I don't care about file
holes for our use cases, so it's fine.
Other changes while I'm here:
- Tweak the sendfile() case to match the newly inlined logic for cfr
- Add a TEMP_FAILURE_RETRY() around the read()
We should be able to rely upstream on everything *except* `glnx_unref_object`
which requires the library itself to depend on a newer glib, which isn't true
for e.g. RHEL7 libsoup.
libostree was almost ready for this; just a few patches to push
it to completion in
https://github.com/ostreedev/ostree/pull/1042
systemd does this by default. I think we should treat this as a fatal error
since it can cause really painful-to-debug problems if we don't just get
EBADF but actually close something else's fd due to a race.
Minor tweak to the new `GLNX_AUTO_PREFIX_ERROR`. Since the common case
is that there's no errors, let's bring down the same check that
`g_prefix_error` does to avoid a function call most of the time.
Another one where we have a lot of inlines in ostree at least. Not the same as
`glnx_shutil_mkdir_p_at()` since in these cases we don't want automatic
intermediate dirs, and it's cheaper to just call `mkdirat()` and handle `EEXIST`
rather than do a `stat()` first.
This is kind of long overdue. Reasons are the same as the other wrappers. I
debated adding `O_NOFOLLOW` support but the use cases for that are pretty
obscure, callers who want that can just use the syscall directly for now.
In a lot of places in ostree, we end up prefixing errors in the *caller*.
Often we only have 1-2 callers, and doing the error prefixing isn't
too duplicative. But there are definitely cases where it's cleaner
to do the prefixing in the callee. We have functions that aren't
ported to new style for this reason (they still do the prefixing
in `out:`).
Introduce a cleanup-oriented version of error prefixing so we can port those
functions too.
Mostly in ostree/rpm-ostree, we work in either raw `int fd`, or
`G{Input,Output}Stream`. One exception is the rpm-ostree `/etc/passwd`
handling, which uses `FILE*` since that's what glibc exposes.
And in general, there are use cases for `FILE*`; the raw `GUnixOutputStream` for
example isn't buffered, and doing so via e.g. `GBufferedOutputStream` means
allocating *two* GObjects and even worse going through multiple vfuncs for every
write.
`FILE*` is used heavily in systemd, and provides buffering. It is a bit cheaper
than gobjects, but has its own trap; by default every operation locks a mutex.
For more information on that, see `unlocked_stdio(3)`. However, callers can
avoid that by using e.g. `fwrite_unlocked`, which I plan to do for most users of
`FILE*` that aren't writing to one of the standard streams like `stdout` etc.
I'm not aware of a problem in practice here, but we should do this on general
principle. Writing this patch now because I hit a fd leak in the ostree static
delta processing that was introduced in the tmpfile prep code, but fixed in the
final port.
Looking at converting the ostree codebase, iterating over only the
values of a hash table (while ignoring the key) is actually a more
common pattern than I thought. So let's give it its own macro as well so
users don't have to resort to the _KV variant.
These macros make it much easier to iterate over a GHashTable. It takes
care of initializing an iterator and casting keys and values to their
proper types.
See the example usage in the docstring for more info.