Further Readings

Those topics won’t be discussed a lot in this document, but some resources are there for you to learn more.

Quantization Aware Training

Steps from TFLite:

(Recommended): Fine tune from a floating point saved model: Start with a floating point pre-trained model or alternately train from scratch
Modify Estimator to add quantization operations: Add fake quantization operations to the model using the quantization rewriter at tf.contrib.quantize
Train model: At the end of this process, we have a savedmodel with quantization information (scale, zero-point) for all the quantities of interest. (weights and activations)
Convert model: The savedmodel with range information is transformed into a flatbuffer file using the tensorflow converter (TOCO) at: tf.contrib.lite.toco convert. This step creates a flatbuffer file that converts the weights into integers and also contains information for quantized arithmetic with activations
Execute model: The converted model with integer weights can now be executed using the TFLite interpreter which can optionally execute the model in custom accelerators using the NN-API. One can also run the model on the CPU.

Please refer to Quantizing deep convolutional networks for efficient inference: A whitepaper for different sets of experiments.

Need further investigation.

Last updated 5 years ago