kimi k2 thinking

This commit is contained in:
Alex Cheema
2025-11-11 10:03:39 -08:00
committed by GitHub
parent 364087b91f
commit 631cb81009
9 changed files with 137 additions and 48 deletions

View File

@@ -19,6 +19,7 @@
21. Make two separate things: tensor or pipeline, and ring or ibv.
22. When downloading for the first time, stuff times out and I think the model never ends up actually loading into memory, or something.
23. Do we need cache_limit? We went back and forth on that a lot because we thought it might be causing issues. One problem is it sets it relative to model size. So if you have multiple models loaded in it will take the most recent model size for the cache_limit. This is problematic if you launch DeepSeek -> Llama for example.
24. Task cancellation. When API http request gets cancelled, it should cancel corresponding task.
Potential refactors: