ptrace 'child' events are treated differently during a wait than
'normal' child events. Speciecially, a process that is being traced
receives events for all threads (all `Tid`s) from a process, rather than
the `Tgid`.
This fixes running strace on multi-threaded programs.
Make the `Tid` globally unique, rather than `Tgid` creating a new number
space. This allows ptrace to differentiate between threads when using
`-f` on a program which spawns threads.
If an object isn't aligned to a cache line boundary, the flush loop
advances by a full cache line stride from the misaligned start address.
This can cause the pointer to overshoot the object end before reaching
the next cache line, leaving it unflushed.
This was observed on secondary CPU bringup where the boot context struct
was not cache-line aligned, causing the secondary core to read stale
data from RAM for fields in the second cache line.
Fix this by aligning the start address down to a cache line boundary
before iterating.
Replace `CUR_TASK_PTR` with `ProcessCtx`. This allows differentiation
between functions that access process context (take in `ProcessCtx` as a
parameter) and those that don't.
When creating a new class of scheduleable tasks (softirqs, kthreads),
this ensure that those functions cannot call context-sensitive
functions.
Since a Arc<Work> can be obtained from `TASK_LIST`, this would allow the
potential mutation of 'owned'-state from other CPUs thereby causing a
race condition. Thefore, ensure that the deref of an `Arc<Work>` only
permits access to `t_shared`.
Ensure that `tick()` is called on the current task, and allow the task
account decide whether we should try to switch to another task.
Also, ensure accounting is updated for freshly inserted tasks into the
runqueue.
Fix two issues:
1. When a task is dropped from the runqueue, it will trigger the
destructors for the task to run. This may well call wakers to wake up
parent processes, other ends of pipes, etc. If we do that while
`SCHED_STATE` is still borrowed, this causes a double-borrow panic. Fix
this by deferring all drops until after we have unlocked `SCHED_STATE`.
2. Tasks inside the runqueue which are yet to be scheduled which become
finished will be returned by `find_next_task` and the state will be set
to `TaskState::Running` overwriting the fact that this task had
`Finish`ed. We'd then queue this task forever. Filter finished tasks in
`find_next_task` and add them to the defered drop list.
Refactor the scheduler so all scheduleable work is wrapped in Arc<Work>,
replacing the previous per-CPU wait_q design where sleeping tasks were
bound to a specific CPU. Wakers now hold direct Arc<Work> references and
can re-enqueue tasks on any CPU upon wakeup.
Key changes:
- Add Work struct wrapping OwnedTask with an AtomicTaskState and
scheduler metadata (SchedulerData), replacing the old SchedulableTask.
Remove Task::state (Arc<SpinLock<TaskState>>). Work::state is now the
single source of truth for task state.
- Rewrite the run queue using BinaryHeap-based eligible/ineligible split
(EEVDF) with a dedicated VClock, replacing the BTreeMap linear scan.
Extract vclock into its own module.
- Rewrite wakers to hold Arc<Work> directly instead of looking up tasks
by TaskDescriptor from TASK_LIST.
- Replace lock-based sleep transitions in uspc_ret with atomic CAS
(try_sleep_current) that correctly detects concurrent Woken state.
- Simplify least-tasked-CPU metric to use only run-queue weight, since
sleeping tasks are no longer bound to any CPU.
- Add current_work() accessor.