Further Readings
Those topics won’t be discussed a lot in this document, but some resources are there for you to learn more.
Quantization Aware Training
Steps from TFLite:
(Recommended): Fine tune from a floating point saved model: Start with a floating point pre-trained model or alternately train from scratch
Modify Estimator to add quantization operations: Add fake quantization operations to the model using the quantization rewriter at tf.contrib.quantize
Train model: At the end of this process, we have a savedmodel with quantization information (scale, zero-point) for all the quantities of interest. (weights and activations)
Convert model: The savedmodel with range information is transformed into a flatbuffer file using the tensorflow converter (TOCO) at: tf.contrib.lite.toco convert. This step creates a flatbuffer file that converts the weights into integers and also contains information for quantized arithmetic with activations
Execute model: The converted model with integer weights can now be executed using the TFLite interpreter which can optionally execute the model in custom accelerators using the NN-API. One can also run the model on the CPU.
Accuracy loss
Please refer to Quantizing deep convolutional networks for efficient inference: A whitepaper for different sets of experiments.
Beyond uniform affine quantizer
Need further investigation.
Last updated