Hi all, I want to get clarification on when I want to build an engine for FP16 with the configuration option
what should be the type of the precision bits for the weights that will be get assigned to the layers of a network.
I am taking the example from sample.py file which is present in the directory
In this file there is a method(api) called
def populate_method(network, weights):
I am pasting few lines of statements from this
input_tensor = network.add_input(name=ModelData.INPUT_NAME, dtype=ModelData.DTYPE, shape=ModelData.INPUT_SHAPE)
conv1_w = weights[‘conv1.weight’].numpy()
** conv1_b = weights[‘conv1.bias’].numpy()**
This sample.py is written to work with FP32 weights.
Suppose if I want to modify it to work with FP16, apart from using this statement
Do I need to convert the weights to be get assigned to all the layers present in the method(api)
If modifying weights to FP16 is not required, then I have few additional clarifications
- How the weights are being converted FP16?
- Which module does this tensorrt?
- Even though FP16 weights are being used, but the input test data is still in FP32 format. In this case how the convolution happens between two numbers, one with FP32(test data) and the other with FP16(model weights). Will it not impact the inference time?
Please clarify all these doubts. These are required urgently as it is required to provide information in my research paper.
Thanks and Regards