Graph Level Optimization

Graph level optimization in TVM is similar to other framework like TensorFlow and ONNX.

Operator fusion

Operator fusion is a simple but important optimization in Deep Learning inference. It simply fuses multiple small operations together, for example:

  • Conv2D + BiasAdd + [Activation]

  • MatMul + BiasAdd + [Activation]

Fusing ops together provides several performance advantages [1]:

  • Improves temporal and spatial locality of data access. For example, MatMul is computed block-wise and bias and activation function can be

    applied while data is still “hot” in cache.

  • Completely eliminates Op scheduling overhead (big win for cheap ops)

  • Increases opportunities for ILP, vectorization etc.

In TVM, it classifies graph operators into four categories [2]:

  • injective (one-to-one map, e.g., add)

  • reduction (e.g., sum)

  • complex-out-fusable (can fuse element-wise map to output, e.g.,

    conv2d)

  • opaque (cannot be fused, e.g., sort).

TVM defines fusion rules based on above operator categories:

  • Multiple injective operators can be fused into another injective operator.

  • A reduction operator can be fused with input injective operators (e.g., fuse scale and sum).

  • Operators such as conv2d are complex-out-fusable, and we can fuse injective operators to its output.

Here're some comparison between fused and non-fused operations from TVM tested on NVIDIA Titan X: With and without fusion [2]

Other rule-based graph optimization:

There're many rule-based optimization methods can be applied, for example [1]:

  • Arithmetic simplifications

    • Flattening: a+b+c+d => AddN(a, b, c, d)

    • Hoisting: AddN(x a, b x, x c) => x AddN(a+b+c)

  • Simplification to reduce number of nodes:

    • Numeric: x+x+x => 3*x

    • Logic: !(x > y) => x <= y

  • Broadcast minimization

    • Example: (matrix1 + scalar1) + (matrix2 + scalar2) => (matrix1 + matrix2) + (scalar1 + scalar2)

  • Layout optimization

    • Example: Use NCHW layout for Conv2D

      NCHW layout for Conv2D [1]

Just like other popular frameworks, TVM also applies lots of rule-based graph optimizations.

Graph tuner

AutoTVM will search optimal schedule for individual operators, but they may still compose a non-optimal graph. So TVM introduces graph tuner to run graph level tuning.

Graph tuner will benchmark all possible layout transformations in the graph, given a set of schedule candidates, and then combine schedule and layout transformation execution time together to find the optimal schedule combination. [3]

TBD: need more investigation on graph tuner.

Reference

[1] TensorFlow Graph Optimizationsarrow-up-right

[2] Tianqi Chen, et al. TVM: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), Carlsbad, CA, 2018. USENIX Association.

[3] https://github.com/apache/incubator-tvm/issues/1585arrow-up-right

Last updated