Further Readings

Optimization Principles

GEMM is the most commonly used operation is neural networks. But there are many others which may also be CPU bound. For those operations, there are a few common optimization principles:

  • Increase the cache hit rate of memory access. Both complex numerical computation and hot-spot memory access can be accelerated from a high cache hit rate. This requires us to transform the origin memory access pattern to the pattern that fits the cache policy.

  • Use SIMD vectorization. This requires us to transform the data access pattern in the loop body in a uniform pattern.

Exercise

Flamearrow-up-right provides a step-by-step guide to test the performance gain of each of those optimization techniques. It’s highly recommended to follow this tutorial to take an exercise.

TVM also provides a comprehensive guidearrow-up-right introducing those techniques and implement them by a Halide-like language.

Deep Dive

Finally, read the paper Anatomy of high-performance matrix multiplicationarrow-up-right to gain more insights on the underlying memory and CPU cache mechanisms as well as the theoretical basis.

Last updated