Data process about TensorRT INT8 and FP16 inference Engine

Hi, I found some information about INT8 inference engine in TensorRT-3-User-Guide, which say “After the network has been built, it can be used just like an FP32 network, for example, inputs and outputs remain in 32-bit floating point”. However, as INT8 engine has converted FP32 weights to INT8 precision, how can I input FP32 image data directly to the network and obtain the Fp32 output?
1.Does optimized network convert input data from FP32 to INT8 firstly and then deal with it?
2.After building an optimized INT8 engine, we just simply convert the weight of the network from FP32 to INT8 or convert the network’s weights and feature maps from FP32 to INT8?
3.Is there any more information about how TensorRT INT8 works? I only found this [url][/url]



TX2 doesn’t support INT8 operation. INT8 is only available on sm_6.1 GPU architecture.

Thanks and sorry for the inconvenience.

I want to test my network with INT8 inference, so I’m using TensortRT on GTX1060, which is sm_6.1 GPU architecture. I’m curious about how the INT8 inference engine works?


  1. Input data conversion is applied automatically.
    Conversion code is available in the NVIDIA® CUDA® library for GPU execution.

  2. You can feed the data and weight directly. TensorRT will take over the quantization.

  3. The slide of GTC-2017 already describes INT8 in detail.