The standalone quantize fold is empirically flat (Lever-2 precedent) with the
worst gain/plumbing ratio; no bit-exact lever remains. Dense 371.81 t/s @npl128
= 95.0% of vLLM 391, recurrence past vLLM at the LPDDR5x DRAM floor, all
byte-identical to llama f32. Only bf16 state (shelved) goes further.
Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>