Make the global `TASK_LIST` struct be a collection of `Task`s, rather
than `task.state` struct members. This allows other cores to access to
any shared task state easily.
Currently, a global wakeup queue is used for all CPUs on the system.
This leads to inefficient behavior regarding preemption. When the
scheduler requests a preemption event, it is inserted into a global list
alongside events from all other CPUs.
When processing IRQs, there is no guarantee which CPU will handle the
timer interrupt. If the current CPU processes a preemption event
intended for a different CPU, it must signal the target CPU via an IPI.
This causes a severe bottleneck, as one CPU may end up distributing
preemption events for the entire system.
Fix this by implementing a per-cpu wakeup queue. Preemption events are
now strictly scheduled for the current CPU, ensuring they are handled
locally by the core that scheduled them. This significantly simplifies
the preemption logic and eliminates the need for IPIs to signal
preemption events.
When switching tasks, we may well be swithing away from a task which is
going to `Sleep`. Therefore the check
```rust
debug_assert_eq!(*prev_task.state.lock_save_irq(), TaskState::Runnable);
```
Is incorrect.
Use EEVDF concepts like virtual deadline correctly and actually calculate the necessary deadline and use it to schedule.
Also dynamically preempts based on the deadline.
There is no need to store a seperate inner struct with the interrupt
manager, refactor that.
Also, reduce the amount of locking when servicing an interrupt,
currently we keep the whole interrupt manager locked when servicing an
interrupt. This should be kept unlocked while the ISR is called.
In an SMP environment, two threads sharing an address space may trigger
a page fault on the same address simultaneously. Previously, the loser
of this race would receive an `AlreadyMapped` error from the page table
mapper, causing the kernel to treat a valid execution flow as an error.
This patch modifies `handle_demand_fault` to gracefully handle these
spurious faults by:
1. Accepting `AlreadyMapped` as a successful resolution. If another CPU
has already mapped the page while we were waiting for the lock
(or performing I/O):, we consider the fault handled.
2. Fixing a memory leak in the race path. We now only `leak()` the
allocated `ClaimedPage` (surrendering ownership to the page tables) if
the mapping actually succeeds. If we lose the race, the `ClaimedPage` is
allowed to go out of scope, causing the `Drop` impl to return the unused
physical frame to the allocator.
3. Applying this logic to both the anonymous mapping path and the
deferred file-backed path.
The `LockedHeap` type provided by `linked_list` doesn't disable
interrupts when modifying the heap. This causes allocations to
potentially deadlock when allocations occur during an ISR.
Fix this by wrapping the `Heap` in our interrupt-aware spinlock.
The `timeout` parameter is only used for `_WAIT` futex ops. For other
ops, the `timeout` parameter is permitted to be an undefied value. The
current implementation would then try to `copy_from_user` using the
garbage pointer and fault, causing a missed wake-up and deadlock the
calling process.
Fix this by only accessing the timeout parmeter for `_WAIT` futex ops
where the parameter's value must be valid.
When a future returns `Poll::Pending`, there is a window where, if a
waker is called, prior to the sched code setting the task's state to
`Sleeping`, the wake-up could be lost. We get around this by
introducing a new state `Woken`.
A waker will set a `Running` task to this state. The sched code then
detects this and *does not* set the task's state to `Sleeping`, instead
it leaves it as running and attempts to re-schedule.
The caller of `switch_to_task` should have already set the task's state
to `Runnable` for it to be considered in this time slice. Ensure that is
the case with a debug_assert.
When scheduling the same task to be run, ensure that the state is set to
`Running` as it will have been set to `Runnable` before entering this
function call.