Graph Level Optimization

Graph level optimization in TVM is similar to other framework like TensorFlow and ONNX.

Operator fusion

Operator fusion is a simple but important optimization in Deep Learning inference. It simply fuses multiple small operations together, for example:

Conv2D + BiasAdd + [Activation]
MatMul + BiasAdd + [Activation]

Fusing ops together provides several performance advantages [1]:

Improves temporal and spatial locality of data access. For example, MatMul is computed block-wise and bias and activation function can be
applied while data is still “hot” in cache.
Completely eliminates Op scheduling overhead (big win for cheap ops)
Increases opportunities for ILP, vectorization etc.

In TVM, it classifies graph operators into four categories [2]:

injective (one-to-one map, e.g., add)
reduction (e.g., sum)
complex-out-fusable (can fuse element-wise map to output, e.g.,
conv2d)
opaque (cannot be fused, e.g., sort).

TVM defines fusion rules based on above operator categories:

Multiple injective operators can be fused into another injective operator.
A reduction operator can be fused with input injective operators (e.g., fuse scale and sum).
Operators such as conv2d are complex-out-fusable, and we can fuse injective operators to its output.

Here're some comparison between fused and non-fused operations from TVM tested on NVIDIA Titan X:

Other rule-based graph optimization:

There're many rule-based optimization methods can be applied, for example [1]:

Arithmetic simplifications
- Flattening: a+b+c+d => AddN(a, b, c, d)
- Hoisting: AddN(x a, b x, x c) => x AddN(a+b+c)
Simplification to reduce number of nodes:
- Numeric: x+x+x => 3*x
- Logic: !(x > y) => x <= y
Broadcast minimization
- Example: (matrix1 + scalar1) + (matrix2 + scalar2) => (matrix1 + matrix2) + (scalar1 + scalar2)
Layout optimization
- Example: Use NCHW layout for Conv2D

Just like other popular frameworks, TVM also applies lots of rule-based graph optimizations.

Graph tuner

AutoTVM will search optimal schedule for individual operators, but they may still compose a non-optimal graph. So TVM introduces graph tuner to run graph level tuning.

Graph tuner will benchmark all possible layout transformations in the graph, given a set of schedule candidates, and then combine schedule and layout transformation execution time together to find the optimal schedule combination. [3]

TBD: need more investigation on graph tuner.

Reference

[1] TensorFlow Graph Optimizations

[2] Tianqi Chen, et al. TVM: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), Carlsbad, CA, 2018. USENIX Association.

[3] https://github.com/apache/incubator-tvm/issues/1585

PreviousOperator Level Optimization

Last updated 5 years ago

hashtagOperator fusion

hashtagOther rule-based graph optimization:

hashtagGraph tuner

hashtagReference

Operator fusion

Other rule-based graph optimization:

Graph tuner

Reference