Excessive RAM usage

mwnuk · January 4, 2024, 8:50pm

Hello, I have a few questions regarding RAM usage. We are operating on the jetson Xavier NK w/ Jetpack 4.6

We have 2 projects running models in pytorch, one that produces artifacts that are a dependency for the other. Both are hungry for memory resources, so we are in the process of working through resource management. What I am observing is that after running inference with either model, the gpu memory allocation remain the same. Ie when I create the model and load the weights it takes approx. 500mb of GPU memory. When I run inference, it consumes 1.1G. And after inference it remains at 1.1G until I shut the container down. I had been assuming that I had retained references to tensors somewhere in the code that I needed to dereference, but I wrote a method to check tensors and delete the non-model parameter tensors in memory. Yet the GPU allocation remains. So, I have a couple of questions:

Is there an explicit way to determine all tensors in memory and weather they are model parameters. I’m currently examining all objects in the garbage collector, determining if they are tensors, then determining if they are of type torch.nn.parameter or type torch.device and deleting if neither.
Is there perhaps a setting similar to PYTORCH_NO_CUDA_MEMORY_CACHING that I should be defining. My understanding is that this is bad practice in production due to additional latency. But in a resource limited environment where multiple containers need to pass GPU resources back and forth or other method of preventing memory allocated at inference from being retained.
When running jtop I see that the memory allocated for each process doesn’t add up to the total memory used. Below is an example of what I am seeing from jtop ( i assume ~2G are from OS)
a. Baseline (nvargus-daemon & symbot_server): 30mb mem and 300mb gpu each – Total RAM used 2.7G
b. Baseline + (Project 1 Model Loaded: .7G cpu, .7G gpu) – Total RAM used 5.1G
c. Baseline + (Project 2 Model Loaded: .7G cpu, .9G gpu) – Total RAM used 4.8G
d. Baseline + (Projects 1 & 2 Model Loaded: 1.1G cpu, 1.5G gpu) – Total RAM used 7G
e. Baseline + (Both Loaded and Running inference) – RAM overflow
Running docker stats lists the memory usage to be different, often smaller than the memory listed in jtop. jtop agrees with tegrastats.

AastaLLL · January 5, 2024, 3:55am

Hi,

PyTorch uses CUDA for inference so most of the memory (~600Mb) is occupied by loading the CUDA-related library (especially cuDNN).
This memory won’t be free by deleting the tensor or parameters. It’s used for CUDA initialization.

Is TensorRT an option for you?
If you already have an ONNX format model, it can be inference with TensorRT easily.
TensoRT has several mechanisms that can help to control memory (tradeoff between performance).
For example, you can set up the workspace or the backend (cuDNN, cuBLAS, …) based on the limited resources.

Below is an example of TensorRT 8.5 (JetPack5).
It might have some API differences with TensorRT 8.2 (JetPack 4) but should be very similar:

https://elinux.org/Jetson/L4T/TRT_Customized_Example#OpenCV_with_ONNX_model

Thanks.

mwnuk · January 17, 2024, 7:09pm

Hi,

Thank you for the tips. So I got onnx version of our node up and running and tested with both the CPU provider and the TensorRT provider without altering the memory options.

CPU provider reduces the memory requirements by 1.5G and TensorRT actually increases by a significant margin which confuses me a bit. Why would TensorRT take more memory than the pytorch model where it is loading the entire cuda library?

AastaLLL · January 29, 2024, 3:25am

Hi,

Do you run it with ONNXRuntime or PyTorch?
If yes, could you try to run it with trtexec instead?

Thanks.

system · February 12, 2024, 3:25am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
High RAM consumption with CUDA and TensorRT on Jetson Xavier NX Jetson Xavier NX tensorrt	10	2838	October 18, 2021
How does TensorRT use host memory (RAM) at runtime? TensorRT tensorrt , onnx	3	1808	August 3, 2023
TensorRT used lots of memory when loading model files Jetson Orin NX tensorrt	6	1169	May 31, 2023
Very large CPU RAM Usage in TensorRT General	7	6140	October 12, 2021
Is it normal to use 3.5GB of RAM after doing inference with the TX2 when using TensorRT? Jetson TX2 jetson-inference	6	506	October 18, 2021
High RAM consumption with CUDA and TensorRT on Jetson Xavier NX TensorRT	1	514	July 8, 2021
GPU vs CPU deep learning memory usage Jetson Nano cudnn	5	692	March 26, 2024
Memory usage will drop after I use trtexec Jetson Xavier NX tensorrt	7	1165	November 17, 2021
Jetson AGX Xavier GPU RAM usage for object detection and instance segmentation inferencing Jetson AGX Xavier tensorrt , jetson-inference , pytorch , onnx	2	888	May 13, 2022
Same memory usage for fp16 and int8 Jetson Xavier NX tensorrt	4	2144	September 27, 2021

Excessive RAM usage

Related topics