Further Readings

Optimization Principles

GEMM is the most commonly used operation is neural networks. But there are many others which may also be CPU bound. For those operations, there are a few common optimization principles:

Increase the cache hit rate of memory access. Both complex numerical computation and hot-spot memory access can be accelerated from a high cache hit rate. This requires us to transform the origin memory access pattern to the pattern that fits the cache policy.
Use SIMD vectorization. This requires us to transform the data access pattern in the loop body in a uniform pattern.

Exercise

Flame provides a step-by-step guide to test the performance gain of each of those optimization techniques. It’s highly recommended to follow this tutorial to take an exercise.

TVM also provides a comprehensive guide introducing those techniques and implement them by a Halide-like language.

Deep Dive

Finally, read the paper Anatomy of high-performance matrix multiplication to gain more insights on the underlying memory and CPU cache mechanisms as well as the theoretical basis.

PreviousImplementation NextQuant Basics

Last updated 5 years ago

hashtagOptimization Principles

hashtagExercise

hashtagDeep Dive

Optimization Principles

Exercise

Deep Dive