Make the `Tid` globally unique, rather than `Tgid` creating a new number
space. This allows ptrace to differentiate between threads when using
`-f` on a program which spawns threads.
Replace `CUR_TASK_PTR` with `ProcessCtx`. This allows differentiation
between functions that access process context (take in `ProcessCtx` as a
parameter) and those that don't.
When creating a new class of scheduleable tasks (softirqs, kthreads),
this ensure that those functions cannot call context-sensitive
functions.
Refactor the scheduler so all scheduleable work is wrapped in Arc<Work>,
replacing the previous per-CPU wait_q design where sleeping tasks were
bound to a specific CPU. Wakers now hold direct Arc<Work> references and
can re-enqueue tasks on any CPU upon wakeup.
Key changes:
- Add Work struct wrapping OwnedTask with an AtomicTaskState and
scheduler metadata (SchedulerData), replacing the old SchedulableTask.
Remove Task::state (Arc<SpinLock<TaskState>>). Work::state is now the
single source of truth for task state.
- Rewrite the run queue using BinaryHeap-based eligible/ineligible split
(EEVDF) with a dedicated VClock, replacing the BTreeMap linear scan.
Extract vclock into its own module.
- Rewrite wakers to hold Arc<Work> directly instead of looking up tasks
by TaskDescriptor from TASK_LIST.
- Replace lock-based sleep transitions in uspc_ret with atomic CAS
(try_sleep_current) that correctly detects concurrent Woken state.
- Simplify least-tasked-CPU metric to use only run-queue weight, since
sleeping tasks are no longer bound to any CPU.
- Add current_work() accessor.
If multiple drivers are associated with a compatible string, don't
override an existing probe_fn. Instead, store a list of associated
probe_fns. Once a probe_fn has claimed the device stop calling the
associated probe chain.
Add a virtio-rng entropy source driver that registers with the kernel
RNG subsystem. Introduce `ProbeError::NoMatch` so virtio drivers can
silently reject empty or wrong-type MMIO slots without spamming the boot
log with fatal errors or leaving devices stuck in the deferred queue.
Replace the single `SmallRng` with a proper two-layer RNG architecture:
- Global BLAKE2s-256 entropy pool that accumulates entropy and gates
seed extraction behind a 256-bit threshold.
- Per-CPU ChaCha20Rng instances that are lazily seeded from the pool on
first use and periodically reseed every 1 MB by XOR-ing a fresh BLAKE2
extract with 32 bytes of their own output.
The /dev/random chardev uses fill_random_bytes directly instead of
routing through the syscall layer.
Currently, each task implements it's own priority value. In Linux, each
thread group (process) has a default process which all tasks in that
group inherit. Tasks can, however, override the default process
priority.
Implement that logic here which also fixes the current compilation error
on master.
This commit refactors the core process representation to decouple
"Identity/Resources" from "Execution/Scheduling". Previously, a
monolithic `Task` struct wrapped in `Arc<SpinLock<>>` caused lock
contention during hot scheduling paths and conflated shared state with
CPU-local state.
The `Task` struct has been split into:
1. `Task` (Shared): Holds process-wide resources (VM, FileTable,
Credentials). Managed via `Arc` and internal fine-grained locking.
2. `OwnedTask` (Private): Holds execution state (Context, v_runtime,
signal mask). Strictly owned by a specific CPU (via the Scheduler) and
accessed lock-free.
Key changes:
* Scheduler:
chedState` now owns tasks via `Box<OwnedTask>`.
- Transitions between `run_queue` and `running_task` involve strictly
moving ownership of the Box, ensuring pointer stability.
- The EEVDF comparison logic now explicitly handles comparisons
between the queued candidates and the currently running task (which is
not in the queue).
* Current Task Access:
- `current()` now returns a `CurrentTaskGuard` which:
1. Disables preemption (preventing context switches while holding
the reference).
2. Performs a runtime borrow check (panic on double-mutable borrow).
3. Dereferences a cached Per-CPU raw pointer for O(1) access.
Make the global `TASK_LIST` struct be a collection of `Task`s, rather
than `task.state` struct members. This allows other cores to access to
any shared task state easily.
Currently, a global wakeup queue is used for all CPUs on the system.
This leads to inefficient behavior regarding preemption. When the
scheduler requests a preemption event, it is inserted into a global list
alongside events from all other CPUs.
When processing IRQs, there is no guarantee which CPU will handle the
timer interrupt. If the current CPU processes a preemption event
intended for a different CPU, it must signal the target CPU via an IPI.
This causes a severe bottleneck, as one CPU may end up distributing
preemption events for the entire system.
Fix this by implementing a per-cpu wakeup queue. Preemption events are
now strictly scheduled for the current CPU, ensuring they are handled
locally by the core that scheduled them. This significantly simplifies
the preemption logic and eliminates the need for IPIs to signal
preemption events.