Further Readings
Optimization Principles
GEMM is the most commonly used operation is neural networks. But there are many others which may also be CPU bound. For those operations, there are a few common optimization principles:
Increase the cache hit rate of memory access. Both complex numerical computation and hot-spot memory access can be accelerated from a high cache hit rate. This requires us to transform the origin memory access pattern to the pattern that fits the cache policy.
Use SIMD vectorization. This requires us to transform the data access pattern in the loop body in a uniform pattern.
Exercise
Flame provides a step-by-step guide to test the performance gain of each of those optimization techniques. It’s highly recommended to follow this tutorial to take an exercise.
TVM also provides a comprehensive guide introducing those techniques and implement them by a Halide-like language.
Deep Dive
Finally, read the paper Anatomy of high-performance matrix multiplication to gain more insights on the underlying memory and CPU cache mechanisms as well as the theoretical basis.
Last updated