Does tensor rt 5 automatically enable tensor core for int8 and fp16 mode?

Hi,

from https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/, it seems tensor core is automatically enabled when the model is configured in int8 or fp16 mode. The article shows a python example as well.

May I know if tensor core could also be turned on when I use the C++ API?

virtual void setInt8Mode(bool mode) = 0;
and
virtual void setFp16Mode(bool mode) = 0;

Thanks!

Hello,

Yes, you choose to set both INT8 and FP16 mode if the platform supports it. TensorRT will choose the most performance optimal kernel to perform inference. For more information, please reference:

https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#mixed_precision_c

Thanks for the confirmation.

Also two related question:

If my model weight is trained in FP32, will tensorrt automatically convert the weight during FP16 mode?

Also if my plugins for the model are all implemented in FP32, can FP16 still be used for the non-plugin stages (e.g. convolution)?

Thanks!

Hello,

TRT will convert if you specify FP16 to the builder.

Thanks, what about plugin implementation, will tensorrt run default layers in FP16 mode and then run customized layers in FP32 mode?

Hi cjluo, Have you figured out whether tensorrt runs plugin in FP16 mode? I am trying to run my custom plugin in FP16 mode, but seems there’s some technical issues with plugin implementation. I am wondering if we need to manually to do data type conversion.

I think the custom layers are still by default FP32. I haven’t spent time on FP16 plugins yet.