Need clarification on precision bits between the network layer weigts and builder configuration option config.set_flag(trt.BuilderFlag.FP16)

Hi all, I want to get clarification on when I want to build an engine for FP16 with the configuration option
config.set_flag(trt.BuilderFlag.FP16)

what should be the type of the precision bits for the weights that will be get assigned to the layers of a network.

I am taking the example from sample.py file which is present in the directory
/usr/src/tensorrt/samples/python/network_api_pytorch_mnist/sample.py

In this file there is a method(api) called
def populate_method(network, weights):
I am pasting few lines of statements from this
input_tensor = network.add_input(name=ModelData.INPUT_NAME, dtype=ModelData.DTYPE, shape=ModelData.INPUT_SHAPE)
conv1_w = weights[‘conv1.weight’].numpy()
** conv1_b = weights[‘conv1.bias’].numpy()**
This sample.py is written to work with FP32 weights.

Suppose if I want to modify it to work with FP16, apart from using this statement
config.set_flag(trt.BuilderFlag.FP16)

Do I need to convert the weights to be get assigned to all the layers present in the method(api)
def populate_network()

If modifying weights to FP16 is not required, then I have few additional clarifications

  1. How the weights are being converted FP16?
  2. Which module does this tensorrt?
  3. Even though FP16 weights are being used, but the input test data is still in FP32 format. In this case how the convolution happens between two numbers, one with FP32(test data) and the other with FP16(model weights). Will it not impact the inference time?

Please clarify all these doubts. These are required urgently as it is required to provide information in my research paper.

Thanks and Regards

Nagaraj Trivedi

Hi all, let me know an update on this query.

Thanks and Regards

Nagaraj Trivedi

Dear @trivedi.nagaraj,
Note that, the developers have to set the required precision for inference/build engine and TRT framework takes care of the needed conversions and select the corresponding optimized CUDA kernel implementation for each layer.

How the weights are being converted FP16? Which module does this tensorrt?

The conversion is taken care in engine building phase by TensorRT framework. Please see Developer Guide :: NVIDIA Deep Learning TensorRT Documentation for details.

Even though FP16 weights are being used, but the input test data is still in FP32 format. In this case how the convolution happens between two numbers, one with FP32(test data) and the other with FP16(model weights). Will it not impact the inference time?

TensorRT framework introduces reformat layer whenever a format conversion is needed to process the data using TRT layer.

Thank you ShivaramKrishna for clarifying me.

Thanks and Regards

Nagaraj Trivedi

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.