mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-23 16:19:07 -04:00
Hand-tuned nChw16c AVX-512 register-tiled direct-conv microkernel (~263 GFLOP/s, within 6-7% of MLAS per-op efficiency), runtime-CPUID-dispatched + AVX2 fallback, fused bias/relu. voice 7ecfd07: default 3x3-s1 kernel for WeSpeaker (+37%/+32%) + ERes2Net, CAM++ pinned to Winograd. face be22d67: shape-gated to the ArcFace recognizer body (+25-27% @8t); SCRFD detector stays on Winograd (no regression). Parity cosine=1.0 / detect <=1px on AVX-512 + AVX2 paths. Portable single binaries. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]