Very large CPU RAM Usage in TensorRT

ran.vardimon · August 1, 2019, 11:07am

We’re using TRT 5.0 python to run our model and find that the CPU RAM consumption is about 2.6G of memory. We find that

1.1G is consumed when creating the TRT runtime itself
1.5G additionally used after the call to deserialize_cuda_engine.

This size does not seem to vary by much based on the model’s input size or FP16 vs FP32. We’ve also checked with using a C+±only inference engine, and get lower but still very high memory usage of approx 1.9G. We checked different models with GPU memory usage between 0.8-1.4G.

There is a setting called max_workspace_size, which can affect the amount of consumed GPU memory, but in our case modifying this value did not produce significant differences.

My questions are:

are these large values expected, or is the expected memory usage significantly lower?
how can we reduce the RAM usage? We aim for less than 0.5G RAM

Thanks,
Ran

Details:

output produced by a profiling tool showing the memory increase per line:

Line # Mem usage Increment Line Contents

85 368.973 MiB 368.973 MiB @profile
86 def load_engine(trt_filename):
87 pass # logger.info(“Reading engine from file {}”.format(trt_filename))
88 # with open(trt_filename, “rb”) as trt_file, trt.Runtime(get_trt_logger()) as runtime:
89 # return runtime.deserialize_cuda_engine(trt_file.read())
90 368.973 MiB 0.000 MiB trt_file = open(trt_filename, “rb”)
91 1477.680 MiB 1108.707 MiB runtime = trt.Runtime(get_trt_logger())
92 1537.953 MiB 60.273 MiB trt_file_contents = trt_file.read()
93 3041.938 MiB 1503.984 MiB engine = runtime.deserialize_cuda_engine(trt_file_contents)
94 3041.938 MiB 0.000 MiB trt_file.close()
95 3041.938 MiB 0.000 MiB return engine

AastaLLL · August 5, 2019, 8:24am

Hi,

The memory is occupied by the TensorRT library.
It takes around 800Mb to loading cuBLAS/cuDNN/TensorRT libraries.

You can check this with the sample shared here:
[url]https://devtalk.nvidia.com/default/topic/1055977/jetson-nano/cuda-memory-release/post/5356147/#5356147[/url]

Thanks.

ran.vardimon · August 5, 2019, 8:47am

so there is no way to reduce this contact memory usage?
TRT requires 0.8G - 1.1G RAM when loading no matter what? any plans to improve this in future versions?

what about the 1.5G RAM used when loading the model, is there a way to reduce the memory? The model itself is loaded to the GPU, so why is there a need to hold so much CPU memory?

Thanks

AastaLLL · August 14, 2019, 2:16am

Hi,

YES. We are planning to extract cuDNN into a separate library only with the inference-essential part.
However, this is not ready yet.

In Jetson platform, the physical memory is shared with CPUs and the GPU.
So the occupied 1.5G memory should also include GPU memory.

There is an argument in TensorRT can control the maximal memory allocation.
It’s worthy to check the argument:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#faq

Q: How do I choose the optimal workspace size?

Thanks.

kalyan.c · February 17, 2020, 2:53pm

Hi AastaLLL,

We also facing similar issue when loading TensorRT engine.We are working on application where multiple networks to be load on to RAM on jetson nano. TensoRT taking more memory even though my each network size of 50MB.

Please point us if inference-essential cuda libraries are available for nano.

Thanks,
kalyan ch

nguyenr20 · October 13, 2020, 4:59pm

Hi,

Does TensorRT 7.2.0 fix this issue?

Tensorrt 7.2.0 Release notes says:

TensorRT now uses cuBLASLt internally instead of cuBLAS. This decreases the overall runtime memory footprint. Users can revert to the old behavior by using the new setTacticSources API in IBuilderConfig.

I haven’t been able to test time on Jetson boards.

ponadto · April 30, 2021, 8:09am

@AastaLLL do you have any update on this? Is it possible to use less memory with the latest version?

Topic		Replies	Views
Very large CPU RAM Usage in TensorRT TensorRT	4	1171	December 13, 2021
Expected Tensor RT 8 RAM Usage Jetson TX2 tensorrt	2	519	March 2, 2022
Optimizing memory consumption on Jetson Jetson AGX Xavier jetson-inference	10	1082	October 18, 2021
TensorRT used lots of memory when loading model files Jetson Orin NX tensorrt	6	1172	May 31, 2023
TensorRT model consuming more amount of RAM Jetson TX2 tensorrt	3	886	October 18, 2021
Excessive RAM usage Jetson Xavier NX pytorch , docker-machine-learning	4	866	February 12, 2024
Memory Usage Discrepancy with TensorRT 8.6 and 8.2 Jetson TX2 tensorrt	3	341	March 27, 2024
High RAM consumption with CUDA and TensorRT on Jetson Xavier NX Jetson Xavier NX tensorrt	10	2838	October 18, 2021
Is it normal to use 3.5GB of RAM after doing inference with the TX2 when using TensorRT? Jetson TX2 jetson-inference	6	506	October 18, 2021
GPU vs CPU deep learning memory usage Jetson Nano cudnn	5	692	March 26, 2024

Very large CPU RAM Usage in TensorRT

Line # Mem usage Increment Line Contents

Related topics