Does mixed precision reduce runtime memory size?

vincentj · June 4, 2019, 8:06am

Hi everyone, I’m using Nvidia’s example on yolov3_onnx for python and I tried setting all the necessary fp16 parameters.

with trt.Builder(self._TRT_LOGGER) as builder, builder.create_network() as network, trt.OnnxParser(network, self._TRT_LOGGER) as parser:
			builder.max_workspace_size = 1 << 30 # 1GB
			builder.max_batch_size = 1
			builder.fp16_mode = True
			builder.strict_type_constraints= True

I’ve even set each layer to the desired data type:

def _show_network(self, network):
		for index in range(network.num_layers):
			layer = network.get_layer(index)
			layer.precision = trt.float16
			for idx in range(layer.num_outputs):
				layer.set_output_type(idx, trt.float16)

I’m getting the intended speed up on inferencing but what I’m curious about is the runtime memory size for the GPU. Using TensorRT with the fp16 setting unset, the memory usage I get (nvidia-smi) is around 785MB. I was surprised to see that even when I had set all the fp16 settings, the output from nvidia-smi showed that the memory usage is still at 785MB. Is this what I should be seeing? All the while, I thought TensorRT (specifically FP16) will help to reduce the GPU memory usage of the network?

FYI, here are some specs from my system:
X-server (Ubuntu 18.04)
P100 GPU (Driver: 410.104) [1]
CUDA 10.0
TensorRT 5.1.2
YOLOv3

Let me know if you need more information (although I can’t provide the models since they’re confidential; for a client we have). Thanks in advance!

ashar.fatmi · November 20, 2019, 9:27am

@vincentj

Did You find the answer to this??

Topic		Replies	Views
TRT inference fp32 vs fp16 TensorRT	4	2937	June 17, 2020
Does training deep learning mode on fp16 uses less memory? Jetson Nano ai-training	3	679	December 15, 2020
Same memory usage for fp16 and int8 Jetson Xavier NX tensorrt	3	2384	July 22, 2021
little speed up when convert fp32(frozen model) to fp16 in yolo-v3 inference TensorRT	0	479	September 12, 2019
How much GPU memory does TensorRT need to convert a model (e.g. Llama 7b with fp16) TensorRT	2	1471	November 4, 2023
Float16 does not halve existing compared to Float32 ？ TensorRT tensorrt	1	583	October 28, 2020
High RAM consumption with CUDA and TensorRT on Jetson Xavier NX Jetson Xavier NX tensorrt	9	3167	August 23, 2021
Tensorrt Engine use too much memory TensorRT tensorrt	1	1704	December 13, 2021
GPU vs CPU deep learning memory usage Jetson Nano cudnn	4	971	March 6, 2024
TensorRT Inferencing using TF-TRT framework FP32 vs FP16 Jetson AGX Orin tensorrt	5	577	June 3, 2024

Does mixed precision reduce runtime memory size?

Related topics