Further Readings

Those topics won’t be discussed a lot in this document, but some resources are there for you to learn more.

Quantization Aware Training

Stepsarrow-up-right from TFLite:

  1. (Recommended): Fine tune from a floating point saved model: Start with a floating point pre-trained model or alternately train from scratch

  2. Modify Estimator to add quantization operations: Add fake quantization operations to the model using the quantization rewriter at tf.contrib.quantize

  3. Train model: At the end of this process, we have a savedmodel with quantization information (scale, zero-point) for all the quantities of interest. (weights and activations)

  4. Convert model: The savedmodel with range information is transformed into a flatbuffer file using the tensorflow converter (TOCO) at: tf.contrib.lite.toco convert. This step creates a flatbuffer file that converts the weights into integers and also contains information for quantized arithmetic with activations

  5. Execute model: The converted model with integer weights can now be executed using the TFLite interpreter which can optionally execute the model in custom accelerators using the NN-API. One can also run the model on the CPU.

Accuracy loss

Please refer to Quantizing deep convolutional networks for efficient inference: A whitepaper for different sets of experiments.

Beyond uniform affine quantizer

Need further investigation.

Last updated