ptrace 'child' events are treated differently during a wait than
'normal' child events. Speciecially, a process that is being traced
receives events for all threads (all `Tid`s) from a process, rather than
the `Tgid`.
This fixes running strace on multi-threaded programs.
Make the `Tid` globally unique, rather than `Tgid` creating a new number
space. This allows ptrace to differentiate between threads when using
`-f` on a program which spawns threads.
Replace `CUR_TASK_PTR` with `ProcessCtx`. This allows differentiation
between functions that access process context (take in `ProcessCtx` as a
parameter) and those that don't.
When creating a new class of scheduleable tasks (softirqs, kthreads),
this ensure that those functions cannot call context-sensitive
functions.
Since a Arc<Work> can be obtained from `TASK_LIST`, this would allow the
potential mutation of 'owned'-state from other CPUs thereby causing a
race condition. Thefore, ensure that the deref of an `Arc<Work>` only
permits access to `t_shared`.
Refactor the scheduler so all scheduleable work is wrapped in Arc<Work>,
replacing the previous per-CPU wait_q design where sleeping tasks were
bound to a specific CPU. Wakers now hold direct Arc<Work> references and
can re-enqueue tasks on any CPU upon wakeup.
Key changes:
- Add Work struct wrapping OwnedTask with an AtomicTaskState and
scheduler metadata (SchedulerData), replacing the old SchedulableTask.
Remove Task::state (Arc<SpinLock<TaskState>>). Work::state is now the
single source of truth for task state.
- Rewrite the run queue using BinaryHeap-based eligible/ineligible split
(EEVDF) with a dedicated VClock, replacing the BTreeMap linear scan.
Extract vclock into its own module.
- Rewrite wakers to hold Arc<Work> directly instead of looking up tasks
by TaskDescriptor from TASK_LIST.
- Replace lock-based sleep transitions in uspc_ret with atomic CAS
(try_sleep_current) that correctly detects concurrent Woken state.
- Simplify least-tasked-CPU metric to use only run-queue weight, since
sleeping tasks are no longer bound to any CPU.
- Add current_work() accessor.
Create a new submodule within `memory`, `allocators` which contains all
memory allocators. Also split out the `Frame` struct from the `pg_alloc`
module, allowing it to be used by other modules.
The `sys_clock_nanosleep` implementation was missing a `flags` parameter
in the second position. This meant that `rqtp` was interpreted as
`rmtp`, and when interrupted, `rmtp` would write to a garbage address
causing an `EFAULT`.
Fix the argument ordering.
Poll the wrapped future first, ensuring that the interrupt logic doesn't
short-circuit a future which is already ready.
This fixes an issue where `sys_wait4` would return `-EINTR` when
receiving a `SIGCHLD`. Since `SIGCHLD` indicates the wait condition
is met, the underlying future is ready and should return the PID
successfully rather than aborting.
Allow the `nanosleep` family of functions to be interrupted. When an
interruption occures, calculation the remaining duration and write that
back to user-space.
Add a test to ensure proper functionality into usertest.
Add a new struct, `InterruptableFut` which allows signal
short-circuiting logic. If a future within the kernel's syscall logic is
wrapped in a `InterruptableFut`, then a wakeup with any pending signals
causes the underlying future to be dropped and it's operation cancelled.
Provide a `InterruptResult` enum to allow the caller to know whether the
operation was interrupted and allows them to take appropriate action.
Typically exiting with `-EINTR`.
Finally, provide a blanket implementation for all futures, allowing then
to call `.interruptable()` to easily wrap any future.
Currently we close CLOEXEC FDs when we call `clone()` when making a copy
of the file descriptor table. Defer this until the `exec` syscall.
This fixes numerous bugs, namely with bash and setting the foreground
process group for the current TTY.
When calling `clone()` with PTRACE_O_TRACEFORK set, make the child
inherit the current ptrace context. Also start the process with SIGSTOP
pending as per the ptrace docs.
This enable strace follow-forks functionality `strace -f`.
We currently union the UNMASKABLE_SIGNALS set with the new signal mask.
This does the complete opposite of what we wnat, we want to *remove*
those signals from the newly computed signal mask.
This patch removes the UNMASKABLE_SIGNALS set from any newly computed
signal mask.