Schedule
Let's recall what we've learnt to optimize arithmetic computations over matrices or vecotors.
In GEMM, the matrices are tiled. For each tile, we pack it to improve spatial locality and use SIMD instructions to vectorize calculations.
In LSTM, small input matrices of different time slots are merged for better computational intensity. The weight matrix is also be altered to fit into L2 cache to improt cache hit rate.
In BERT, a few operations are fused and we parallel transformers of multi-head.
These techniques, among many others, transforms the program and data structure to improve cache locality and seek for parallelism opportunities. However, it is not easy for non-experts to discover those opportunities and then implement them manually, appropriately. Halide is about to ease the pain by providing a couple of primitives to ecapsulate common solution patterns.
Halide is designed for image processing instead of general tensor operations. First of all, we need to get familar with concepts that are used within the context of Halide, image processing or computer vision.
domain. Domain is referred to as a two dimession array of the pixels of an input image.
Algorithm and Schedule
Halide's design philosophy is to decouple the algorithm from the schedule. Here is an algorithm "gradient" that is defiend by adding the row and column indexes of the pixels. The algorithm only cares about what is computed. Where and when each of the pixels is computed is defined by the schedule. By default, pixels are visited along the rows and then down the columns[2].
// the algorithm
Func gradient("gradient");
gradient(x, y) = x + y;
/** the C equivalent **/
for (int y = 0; y < 4; y++) {
for (int x = 0; x < 4; x++) {
printf("evaluating at x = %d, y = %d: %d\n", x, y, x + y);
}
}
The default behavior could be changed by specifying one or more schedules. In the following, the reorder call takes the arguments of the func, and sets a new nesting order for the for loops that are generated. The arguments are specified from the innermost loop out, so the following call puts y in the inner loop.

Domain Order and Call Schedule
Reorder is among many other schedules that Halide proviede to change the way we visit the inputs and generate outputs. The above example is implemented by only one function. In practice, an algorithm might be defined by a chain of multiple functions, thus providing a much bigger space of choices in scheduling.
For a single funtion, Halide defines the order in which the required region of each function's domain(inputs) should be traversed, which is called the domain order, using a traditional set of loop transformation concepts. The schedules related to domain order are reorder, tile, split, unroll, vectorize, parallelize.
Each dimension can be traversed sequentially or in parallel.
Constant-size dimensions can be unrolled or vectorized.
Dimensions can be reordered (e.g. from column- to row-major).
Finally, dimensions can be split by a factor, creating two new dimensions: an outer dimension, over the old range divided by the factor, and an inner dimension, which iterates within the factor. After splitting, references to the original index become
outer × factor + inner.
In addition to the order of evaluation within the domain of each function, the schedule also specifies the granularity of interleaving a function's computation with the storage and computation of its dependent functions. We call these choices the call schedule. The schedules related to call schedule are computer_root and compute_at. Here is a multi-stage pipeline defind by two consecutive function calls. By default, the producer stage is inlined into the consumer stage.
The compute_root will change the schedule behavior to evaluate all of the producer before any of the consumer.

References
[1] Halide @ CVPR2015. [link]
[2] Halide Tutorials. [link]
[3] Ragan-Kelley, Jonathan, et al. "Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines." Acm Sigplan Notices 48.6 (2013): 519-530. [link][cache]
Last updated