LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-23 16:19:07 -04:00

Files

Ettore Di Giacinto 19742aee64 bench(dense): FORCE_CUBLAS no-op for dense too (720.8 vs 721.8) - every flag lever exhausted

Confirms parity (dense+MoE, both phases) is strictly the FP4 tensor-core kernel;
no config/flag shortcut remains.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-20 03:59:27 +00:00

paged

kernel(doc): dense scope resolved - two FP4 kernels (dense first, then grouped)

2026-06-20 03:56:33 +00:00

patches

bench(dense): FORCE_CUBLAS no-op for dense too (720.8 vs 721.8) - every flag lever exhausted

2026-06-20 03:59:27 +00:00

CMakeLists.txt

fix(turboquant): resolve common.h by detecting llama-common vs common target (#9413 )

2026-04-18 20:30:28 +02:00

grpc-server.cpp

feat: generic chat_template_kwargs (model config + per-request metadata) (#10359 )

2026-06-16 12:16:34 +02:00

Makefile

build(paged): stacking patch-series scaffolding for llama.cpp paged attention

2026-06-19 22:53:20 +00:00

package.sh

fix(llama.cpp): bundle libdl, librt, libpthread in llama-cpp backend (#9099 )

2026-03-22 00:58:14 +01:00

prepare.sh

chore: ⬆️ Update ggml-org/llama.cpp to 7f8ef50cce40e3e7e4526a3696cb45658190e69a (#7402 )

2025-12-01 07:50:40 +01:00

run.sh

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00