mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-28 10:27:30 -04:00
docs(paged): add the bf16-tau opt-in line to the decode plots
Per request, the plots now show all four series: llama.cpp (standard), vLLM, LocalAI's llama.cpp patches (bit-exact hero), and LocalAI's patches + bf16-tau (opt-in ceiling, +3% to +17% over the patches, ahead of vLLM at every dense width and MoE npl>=32). Subtitle flags bf16-tau as opt-in / not bit-exact. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
@@ -164,7 +164,7 @@ swept over serving width `npl` in {8, 32, 64, 128}. Plots:
|
||||
[`qwen36_moe_decode_vs_npl.png`](docs/qwen36_moe_decode_vs_npl.png); raw data
|
||||
[`final_benchmark.csv`](docs/final_benchmark.csv).
|
||||
|
||||

|
||||

|
||||
|
||||
> **What was re-measured (2026-06-27).** The three llama columns - **stock**,
|
||||
> **patched**, and **patched+bf16-tau** - were all re-measured this session on one
|
||||
|
||||
Binary file not shown.
|
Before Width: | Height: | Size: 193 KiB After Width: | Height: | Size: 217 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 104 KiB After Width: | Height: | Size: 123 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 108 KiB After Width: | Height: | Size: 125 KiB |
Reference in New Issue
Block a user