Very large CPU RAM Usage in TensorRT

ran.vardimon · July 29, 2019, 10:02am

We’re using TRT 5.0 python to run our model and find that the CPU RAM consumption is about 2.6G of memory. We find that

1.1G is consumed when creating the TRT runtime itself
1.5G additionally used after the call to deserialize_cuda_engine.

This size does not seem to vary by much based on the model’s input size or FP16 vs FP32. We’ve also checked with using a C+±only inference engine, and get lower but still very high memory usage of approx 1.9G. We checked different models with GPU memory usage between 0.8-1.4G.

There is a setting called max_workspace_size, which can affect the amount of consumed GPU memory, but in our case modifying this value did not produce significant differences.

My questions are:

are these large values expected, or is the expected memory usage significantly lower?
how can we reduce the RAM usage? We aim for less than 0.5G RAM

Thanks,
Ran

Details:

output produced by a profiling tool showing the memory increase per line:

Line # Mem usage Increment Line Contents

85  368.973 MiB  368.973 MiB   @profile
86                             def load_engine(trt_filename):
87                                 pass # logger.info("Reading engine from file {}".format(trt_filename))
88                                 # with open(trt_filename, "rb") as trt_file, trt.Runtime(get_trt_logger()) as runtime:
89                                 #     return runtime.deserialize_cuda_engine(trt_file.read())
90  368.973 MiB    0.000 MiB       trt_file = open(trt_filename, "rb")
91 1477.680 MiB 1108.707 MiB       runtime = trt.Runtime(get_trt_logger())
92 1537.953 MiB   60.273 MiB       trt_file_contents = trt_file.read()
93 3041.938 MiB 1503.984 MiB       engine = runtime.deserialize_cuda_engine(trt_file_contents)
94 3041.938 MiB    0.000 MiB       trt_file.close()
95 3041.938 MiB    0.000 MiB       return engine

zjuywb · July 30, 2019, 12:00pm

I have the same issue, do you solve it?

bluelml · October 11, 2019, 2:23am

I have the same issue. Even if I used “max_workspace_size” to suppress the usage of memory, it did not work
Furthermore, I used Jestion TX2.

Robert_Hoang · December 11, 2021, 10:02am

same issue

NVES · December 13, 2021, 12:32pm

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#model-accuracy

Thanks!

Topic		Replies	Views
Very large CPU RAM Usage in TensorRT General	7	6165	October 12, 2021
Peak memory usage during TensorRT execution TensorRT	2	706	October 16, 2024
What causes the deserializeCudaEngine() fail and how to get the error message? TensorRT tensorrt	9	1866	May 27, 2023
TensorRT model consuming more amount of RAM Jetson TX2 tensorrt	3	887	October 18, 2021
Getting memory usage from trtexec output on Jetson Jetson AGX Orin tensorrt	2	1018	March 10, 2023
GPU vs CPU deep learning memory usage Jetson Nano cudnn	5	713	March 26, 2024
Is it normal to use 3.5GB of RAM after doing inference with the TX2 when using TensorRT? Jetson TX2 jetson-inference	6	506	October 18, 2021
TensortRT Memory Utilization TensorRT	1	380	August 19, 2020
Memory Usage Discrepancy with TensorRT 8.6 and 8.2 Jetson TX2 tensorrt	3	343	March 27, 2024
Excessive RAM usage Jetson Xavier NX pytorch , docker-machine-learning	4	881	February 12, 2024

Very large CPU RAM Usage in TensorRT

Line # Mem usage Increment Line Contents

Related topics