Ensure that `tick()` is called on the current task, and allow the task
account decide whether we should try to switch to another task.
Also, ensure accounting is updated for freshly inserted tasks into the
runqueue.
Fix two issues:
1. When a task is dropped from the runqueue, it will trigger the
destructors for the task to run. This may well call wakers to wake up
parent processes, other ends of pipes, etc. If we do that while
`SCHED_STATE` is still borrowed, this causes a double-borrow panic. Fix
this by deferring all drops until after we have unlocked `SCHED_STATE`.
2. Tasks inside the runqueue which are yet to be scheduled which become
finished will be returned by `find_next_task` and the state will be set
to `TaskState::Running` overwriting the fact that this task had
`Finish`ed. We'd then queue this task forever. Filter finished tasks in
`find_next_task` and add them to the defered drop list.
Refactor the scheduler so all scheduleable work is wrapped in Arc<Work>,
replacing the previous per-CPU wait_q design where sleeping tasks were
bound to a specific CPU. Wakers now hold direct Arc<Work> references and
can re-enqueue tasks on any CPU upon wakeup.
Key changes:
- Add Work struct wrapping OwnedTask with an AtomicTaskState and
scheduler metadata (SchedulerData), replacing the old SchedulableTask.
Remove Task::state (Arc<SpinLock<TaskState>>). Work::state is now the
single source of truth for task state.
- Rewrite the run queue using BinaryHeap-based eligible/ineligible split
(EEVDF) with a dedicated VClock, replacing the BTreeMap linear scan.
Extract vclock into its own module.
- Rewrite wakers to hold Arc<Work> directly instead of looking up tasks
by TaskDescriptor from TASK_LIST.
- Replace lock-based sleep transitions in uspc_ret with atomic CAS
(try_sleep_current) that correctly detects concurrent Woken state.
- Simplify least-tasked-CPU metric to use only run-queue weight, since
sleeping tasks are no longer bound to any CPU.
- Add current_work() accessor.
If multiple drivers are associated with a compatible string, don't
override an existing probe_fn. Instead, store a list of associated
probe_fns. Once a probe_fn has claimed the device stop calling the
associated probe chain.
Add a virtio-rng entropy source driver that registers with the kernel
RNG subsystem. Introduce `ProbeError::NoMatch` so virtio drivers can
silently reject empty or wrong-type MMIO slots without spamming the boot
log with fatal errors or leaving devices stuck in the deferred queue.
Replace the single `SmallRng` with a proper two-layer RNG architecture:
- Global BLAKE2s-256 entropy pool that accumulates entropy and gates
seed extraction behind a 256-bit threshold.
- Per-CPU ChaCha20Rng instances that are lazily seeded from the pool on
first use and periodically reseed every 1 MB by XOR-ing a fresh BLAKE2
extract with 32 bytes of their own output.
The /dev/random chardev uses fill_random_bytes directly instead of
routing through the syscall layer.