Hi, I found some information about INT8 inference engine in TensorRT-3-User-Guide, which say “After the network has been built, it can be used just like an FP32 network, for example, inputs and outputs remain in 32-bit floating point”. However, as INT8 engine has converted FP32 weights to INT8 precision, how can I input FP32 image data directly to the network and obtain the Fp32 output?
1.Does optimized network convert input data from FP32 to INT8 firstly and then deal with it?
2.After building an optimized INT8 engine, we just simply convert the weight of the network from FP32 to INT8 or convert the network’s weights and feature maps from FP32 to INT8?
3.Is there any more information about how TensorRT INT8 works? I only found this http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf
Thanks!