Hi everyone, I’m using Nvidia’s example on yolov3_onnx for python and I tried setting all the necessary fp16 parameters.
with trt.Builder(self._TRT_LOGGER) as builder, builder.create_network() as network, trt.OnnxParser(network, self._TRT_LOGGER) as parser: builder.max_workspace_size = 1 << 30 # 1GB builder.max_batch_size = 1 builder.fp16_mode = True builder.strict_type_constraints= True
I’ve even set each layer to the desired data type:
def _show_network(self, network): for index in range(network.num_layers): layer = network.get_layer(index) layer.precision = trt.float16 for idx in range(layer.num_outputs): layer.set_output_type(idx, trt.float16)
I’m getting the intended speed up on inferencing but what I’m curious about is the runtime memory size for the GPU. Using TensorRT with the fp16 setting unset, the memory usage I get (nvidia-smi) is around 785MB. I was surprised to see that even when I had set all the fp16 settings, the output from nvidia-smi showed that the memory usage is still at 785MB. Is this what I should be seeing? All the while, I thought TensorRT (specifically FP16) will help to reduce the GPU memory usage of the network?
FYI, here are some specs from my system:
X-server (Ubuntu 18.04)
P100 GPU (Driver: 410.104) 
Let me know if you need more information (although I can’t provide the models since they’re confidential; for a client we have). Thanks in advance!