Files
LocalAI/backend/cpp
Ettore Di Giacinto d291e15114 kernel(P0): record precise op-level baseline (q4_K n=512 = 47 TFLOPS, ~22% of ceiling)
test-backend-ops perf MUL_MAT m=4096 k=14336: q4_K prefill (n=512) = 47.1 TFLOPS,
q4_0 = 49.5; decode (n=1) = 761/817 GFLOPS (memory-bound). The prefill GEMM target
is 47 -> ~213 TFLOPS (~4.5x). Cleaner per-shape target than end-to-end for kernel
iteration.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-20 21:33:50 +00:00
..