Optimizing a GEMM from first principlesDraft14 July 2025·14 minsWithin 92% of Intel MKL (single-threaded).