GPU vs CPU deep learning memory usage

frederiki3k63 · March 1, 2024, 9:07am

Hi all,

I have this simple classification neural network (input size 1x64x128x3, so batch size 1):

Its float16 weights are 1.6MB. I’m trying to run it on a memory-constrained Jetson device next to a bunch of a more demanding neural networks. When I run it using ONNXRuntime with the CPUExecutionProvider, the increase in memory usage for the system from before running the network and after is 12 MB. When I use the CUDAExecutionProvider, the memory increase is 990 MB. Not great! So I tried using TensorRT, excluding the tactics I do not need:

/usr/src/tensorrt/bin/trtexec --onnx=classifier.onnx --saveEngine=classifier.trt --fp16 --tacticSources=-CUDNN,-CUBLAS,-CUBLAS_LT

When I use this TensorRT engine, the increase in memory usage is 385 MB. Much better than the 990 MB using ONNXRuntime, but still a lot more than the 12 MB used when I run it on CPU. What gives? I’ve seen people on these forums argue it’s because of the CUDA libraries that need to be loaded, but when I first load another neural network in the same Python process + thread, the memory increase remains the same. And I don’t see any reason for the CUDA libraries to be loaded twice.

For TensorRT inference in Python, I use pycuda based on the provided Nvidia samples in e.g. /usr/src/tensorrt/samples/python/efficientnet. I’m running Jetpack 4.6.3 for this test. Most of the 385 MB memory increase (336 MB) occurs during this line:

with trt.Runtime(self.__class__.TRT_LOGGER) as runtime:
            engine = runtime.deserialize_cuda_engine(data)

whereas the TensorRT engine file loaded is just 3.8MB.

Can anyone elucidate where the difference in memory usage comes from between CPU and GPU inference, and if there is anything to be done about it?

Btw, here is the verbose trtexec log for building the engine if it helps:
trtexec_verbose.log (525.8 KB)

SivaRamaKrishnaNV · March 4, 2024, 2:23pm

Dear @frederiki3k63
How did you check the memory consumption? Is it using tegrastats?

frederiki3k63 · March 5, 2024, 9:19am

I parsed /proc/meminfo and compared MemAvailable before and after loading the network. But I just measured it using tegrastats and I get the same results.

If you need me to create and share a standlone end-to-end test script demonstrating this behaviour, including an ONNX file, I could do it and share it with you privately. Let me know. But to me, it seems the behaviour I’m describing always occurs, independently of which small neural network is ran.

frederiki3k63 · March 6, 2024, 11:37am

Just to add: deserializing the same model twice also gives twice the amount of (V)RAM usage (770 vs 385 MB). According to this NVIDIA employee post this should not happen, which leads me to believe there is something wrong here.

frederiki3k63 · March 6, 2024, 11:51am

Aaaah I finally figured out what I was doing wrong thanks to this post. I was creating a cuda context per model, whereas I should be having one global cuda context for all models in the same process. I have now made the call to

cfx = cuda.Device(0).make_context()

global, and now the overhead is only once instead of per model. I can now run my model with just 12 MB of added memory, so very similar to the CPU memory usage.

I hope this thread helps someone else who is just as obtuse as me. :D

Edit: or even better: use cfx = cuda.Device(0).retain_primary_context().

system · March 26, 2024, 3:28am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
High RAM consumption with CUDA and TensorRT on Jetson Xavier NX Jetson Xavier NX tensorrt	10	2808	October 18, 2021
Memory Usage Discrepancy with TensorRT 8.6 and 8.2 Jetson TX2 tensorrt	3	339	March 27, 2024
Jetson orin nano 4G 上转 OSNet模型（2.2M），workmemory=256,推理的时候为什么会占用1G内存？ Jetson Orin Nano jetson-inference	7	42	February 27, 2025
Excessive RAM usage Jetson Xavier NX pytorch , docker-machine-learning	4	847	February 12, 2024
Very large CPU RAM Usage in TensorRT General	7	6119	October 12, 2021
Orin NX tensorflow - high memory use Jetson Orin NX tensorflow	3	407	January 3, 2024
High RAM consumption with CUDA and TensorRT on Jetson Xavier NX TensorRT	1	513	July 8, 2021
Memory usage difference between Jetson Nano vs PC loading the same model Jetson Nano tensorrt , tensorflow	2	760	March 14, 2022
Very large CPU RAM Usage in TensorRT TensorRT	4	1168	December 13, 2021
Question about the memory usage of jeston nanoo Jetson Orin Nano cuda , jetson-inference , python	3	48	October 17, 2024

GPU vs CPU deep learning memory usage

Related topics