Currently, when a kernel stack overflow occures, the exception handler
blindly attempts to write the current context to the stack. If the SP
isn't valid this causes another fault, and so on - locking up the
system.
This commit re-arranges the stack layout, performs SP validation before
usage and switches to an emergency stack when SP isn't valid. This
allows the handler to run and panic gracefully.
Fixes: #98
Currently, each task implements it's own priority value. In Linux, each
thread group (process) has a default process which all tasks in that
group inherit. Tasks can, however, override the default process
priority.
Implement that logic here which also fixes the current compilation error
on master.
Previously, the scheduler unconditionally performed a full runqueue
search:(`find_next_runnable_desc`) on every invocation, including every
timer tick and syscall return. This resulted in unnecessary overhead.
This change introduces a "lazy preemption" model:
1. Fast-Path Optimization: `do_schedule` now checks if the current task
is valid, is not the Idle task, and still has virtual budget remaining.
If these conditions are met and `force_resched` is not set, the
scheduler returns immediately without locking the runqueue.
2. Preemption & Idle Handling:
- `insert_into_runq` now sets `force_resched` if the new task has an
earlier deadline than the current task, or if the current task is
Idle.
- The Idle task is explicitly excluded from the fast-path to ensure
immediate context switching when new work arrives.
This commit refactors the core process representation to decouple
"Identity/Resources" from "Execution/Scheduling". Previously, a
monolithic `Task` struct wrapped in `Arc<SpinLock<>>` caused lock
contention during hot scheduling paths and conflated shared state with
CPU-local state.
The `Task` struct has been split into:
1. `Task` (Shared): Holds process-wide resources (VM, FileTable,
Credentials). Managed via `Arc` and internal fine-grained locking.
2. `OwnedTask` (Private): Holds execution state (Context, v_runtime,
signal mask). Strictly owned by a specific CPU (via the Scheduler) and
accessed lock-free.
Key changes:
* Scheduler:
chedState` now owns tasks via `Box<OwnedTask>`.
- Transitions between `run_queue` and `running_task` involve strictly
moving ownership of the Box, ensuring pointer stability.
- The EEVDF comparison logic now explicitly handles comparisons
between the queued candidates and the currently running task (which is
not in the queue).
* Current Task Access:
- `current()` now returns a `CurrentTaskGuard` which:
1. Disables preemption (preventing context switches while holding
the reference).
2. Performs a runtime borrow check (panic on double-mutable borrow).
3. Dereferences a cached Per-CPU raw pointer for O(1) access.
Make the global `TASK_LIST` struct be a collection of `Task`s, rather
than `task.state` struct members. This allows other cores to access to
any shared task state easily.
Currently, a global wakeup queue is used for all CPUs on the system.
This leads to inefficient behavior regarding preemption. When the
scheduler requests a preemption event, it is inserted into a global list
alongside events from all other CPUs.
When processing IRQs, there is no guarantee which CPU will handle the
timer interrupt. If the current CPU processes a preemption event
intended for a different CPU, it must signal the target CPU via an IPI.
This causes a severe bottleneck, as one CPU may end up distributing
preemption events for the entire system.
Fix this by implementing a per-cpu wakeup queue. Preemption events are
now strictly scheduled for the current CPU, ensuring they are handled
locally by the core that scheduled them. This significantly simplifies
the preemption logic and eliminates the need for IPIs to signal
preemption events.
When switching tasks, we may well be swithing away from a task which is
going to `Sleep`. Therefore the check
```rust
debug_assert_eq!(*prev_task.state.lock_save_irq(), TaskState::Runnable);
```
Is incorrect.
Use EEVDF concepts like virtual deadline correctly and actually calculate the necessary deadline and use it to schedule.
Also dynamically preempts based on the deadline.
There is no need to store a seperate inner struct with the interrupt
manager, refactor that.
Also, reduce the amount of locking when servicing an interrupt,
currently we keep the whole interrupt manager locked when servicing an
interrupt. This should be kept unlocked while the ISR is called.
In an SMP environment, two threads sharing an address space may trigger
a page fault on the same address simultaneously. Previously, the loser
of this race would receive an `AlreadyMapped` error from the page table
mapper, causing the kernel to treat a valid execution flow as an error.
This patch modifies `handle_demand_fault` to gracefully handle these
spurious faults by:
1. Accepting `AlreadyMapped` as a successful resolution. If another CPU
has already mapped the page while we were waiting for the lock
(or performing I/O):, we consider the fault handled.
2. Fixing a memory leak in the race path. We now only `leak()` the
allocated `ClaimedPage` (surrendering ownership to the page tables) if
the mapping actually succeeds. If we lose the race, the `ClaimedPage` is
allowed to go out of scope, causing the `Drop` impl to return the unused
physical frame to the allocator.
3. Applying this logic to both the anonymous mapping path and the
deferred file-backed path.