JetPack 4.6.1 (L4T R32.7.1): PyTorch allocates all the memory + swap!

ricardo.azambuja · November 21, 2023, 10:36pm

Hi,

I’m trying to use my Jetson Nano with PyTorch by using this official docker image: nvcr.io/nvidia/l4t-pytorch:r32.7.1-pth1.10-py3
However, when I move anything to the “GPU” memory, it allocates all the memory + swap making it unusable.
Want to reproduce it, just try python3 -c "import torch; torch.rand(1).cuda();" from inside the container.
According to tegrastats, the memory peaks at RAM 1846/1980MB (lfb 2x512kB) SWAP 580/5086MB.
When I use pycuda directly I can allocate memory, do whatever I want, and no crazy allocations happen. I even tried trtexec to convert a model and ran it using pycuda + tensorrt, no problem again. It’s only PyTorch. I even tried older versions (r32.6.1-pth1.9-py3), but it was the same problem.

Any help will be much appreciated, thanks in advance!

Cheers,
Ricardo

AastaLLL · November 22, 2023, 5:41am

Hi,

It looks like you are using Nano 2GB.
Since PyTorch itself is relatively big, the occupied memory might just be used for loading the library.

Thanks.

ricardo.azambuja · November 22, 2023, 1:14pm

Hi, thank you very much for your fast reply, but I don’t think this is the case because it works as expected when I do NOT use the GPU with PyTorch (no calls to .cuda() or .to(device='cuda') ). I tried PyTorch with big models and as long as I don’t use CUDA anywhere (therefore everything on the CPU), it just works without this crazy memory leakage (allocation) behaviour.
I tried the official docker images down to PyTorch 1.7, but as soon as I send anything to CUDA the memory blows up. I think someone from Nvidia ignored the existence of the Jetson Nano 2GB and it’s allocating 4GB by default when using CUDA.
As I mentioned in my first message, when I run a huge model using the GPU, but WITHOUT using PyTorch, it works perfectly. A model converted from onnx using trtexec and inference using pycuda and tensorrt just works. Therefore it’s not my Jetson Nano’s fault :)
It really feels like someone from Nvidia decided to make sure Jetson Nano 2GB must not be used anymore, but ecologically and economically that would be really absurd, so I will stick to the idea that it’s a silly bug that Nvidia should fix asap.

dusty_nv · November 22, 2023, 4:41pm

@ricardo.azambuja what it’s doing is loading the huge amount of CUDA kernel code that PyTorch has compiled (PyTorch only does this the first time you actually use GPU). It is not just allocating blank memory, but alas many of those PyTorch kernels go unused (so they can be paged out if you have sufficient swap). Unfortunately PyTorch doesn’t selectively implement a way to only load the needed kernels, and it’s not an NVIDIA bug. For deployment and optimized memory/runtime usage, it’s recommended to export models from PyTorch (typically via ONNX) and run them with TensorRT.

ricardo.azambuja · November 22, 2023, 5:15pm

Thanks @dusty_nv, great explanation! I will stick to ONNX + TensorRT + PyCUDA

system · December 19, 2023, 2:55am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problem with loading models into cuda device (Jetson Nano) Jetson Nano pytorch	3	746	December 29, 2021
Calling cuda() consumes all the RAM memory Jetson Nano cuda	4	1552	October 3, 2021
Pytorch in cuda mode use huge memory Jetson Nano pytorch	3	1214	September 5, 2021
Running PyTorch CUDA Jetson Nano pytorch	8	2039	July 13, 2022
Jetson nano slow cuda times with pytorch Jetson Nano cuda , pytorch	14	931	October 11, 2023
Jetson Xavier NX storage problem Jetson Xavier NX jetson-inference , pytorch , containers	5	1121	September 7, 2022
Jetson nano GPU is not working? Jetson Nano cuda	5	1809	October 15, 2021
Memory Usage in Pytorch Jetson Nano cuda , pytorch	4	2263	October 4, 2021
Jetson Nano running out of memory when running pytorch, even with 6G swapfile Jetson Nano	4	2133	October 15, 2021
Question about the memory usage of jeston nanoo Jetson Orin Nano cuda , jetson-inference , python	3	34	October 17, 2024

JetPack 4.6.1 (L4T R32.7.1): PyTorch allocates all the memory + swap!

Related topics