Q's on TensorRT

Dear All,

I’ve just started using TensorRT and trying to get better understandings. In that, I have few questions in below…


  1. How would TensorRT select fastest convolution algorithm?
  2. Which quantization method (linear, dynamic, etc) is used for weight quantization to FP16 (half-precision)?
  3. Which cuDNN version is internally supported?
  4. Any plans to support LSTM in near future release (possibly in 2.0)?

Thanks in advance, Hak

  1. We look at your network and make a list of all the implementations that we have that can be used, then we try them.

  2. For FP16 your weights need to be in the range expressed by FP16. We’ve found this is usually true. If they are outside the bounds we’ll give an error.

  3. Depends on the version of TensorRT.

  4. Yes! LSTM will be in the next public release.

Hi Chris
For question #2, so the quantization flow from FP32 to FP16 just casts 32bit value into 16bit? Then the reference input dataset is not needed for the calibration, which is mandatory for the FP32 to INT8 quantization.


Hi Chris and csdncannon,

For FP16 and INT8, do we need to provide quantized FP16 and INT8 weights or do TensorRT does this quantization for us ?