Fix NameError for Cache in WrappedMiniMaxAttention

Use string annotation for the Cache type since it only exists in type stubs, not in the actual mlx_lm package at runtime. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 03:33:30 -05:00 · 2026-02-03 19:15:50 -08:00
parent a54ba12dee
commit cd9f3182d9
1 changed files with 1 additions and 1 deletions
--- a/src/exo/worker/engines/mlx/auto_parallel.py
+++ b/src/exo/worker/engines/mlx/auto_parallel.py
@@ -635,7 +635,7 @@ class WrappedMiniMaxAttention(CustomMlxLayer):
        self,
        x: mx.array,
        mask: mx.array | None = None,
-        cache: Cache | None = None,
+        cache: "Cache | None" = None,
    ) -> mx.array:
        batch_dim, seq_dim, _ = x.shape