LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-26 09:26:55 -04:00

Files

Ettore Di Giacinto 001d833426 docs(paged): f16/bf16 glue probe - dense decode residual ceiling

Empirical probe on q36-27b-nvfp4 @npl128 (build f7409c2, patch 0023):
- attention KV cache default is ALREADY f16 (K/V f16) -> --cache-type f16 is a
  no-op; q8_0 within noise -> KV dtype is not a decode lever
- nsys node-trace decode budget: f32-glue (norms/elementwise/activations/attn,
  excl. SSM recurrence + NVFP4 GEMM) = 28.7 ms = 8.4% of step (40.9 ms = 12%
  incl. the non-FP4 cublas GEMM)
- f16 realistically recovers ~11-16 ms of the ~27 ms/step gap = ~40-60% of the
  8.2% residual -> ~95-96% parity, not a full close; non-bit-exact opt-in only

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-26 09:11:21 +00:00

ds4

chore: ⬆️ Update antirez/ds4 to 80ebbc396aee40eedc1d829222f3362d10fa4c6c (#10378 )

2026-06-18 00:32:13 +02:00

grpc

fix: speedup git submodule update with --single-branch (#2847 )

2024-07-13 22:32:25 +02:00

ik-llama-cpp

chore: ⬆️ Update ikawrakow/ik_llama.cpp to d5507e33ae7ee2b7b41475f08044d3bde3b839ee (#10498 )

2026-06-25 00:57:42 +02:00

llama-cpp

docs(paged): f16/bf16 glue probe - dense decode residual ceiling