Optimizing a GEMM from first principlesDraft14 July 2025·15 minsWithin 95% of Intel MKL (single-threaded).