Problem with loading models into cuda device (Jetson Nano)

Hi. I have a rather weird problem when I try loading a pytorch model into GPUs.

import torch.nn as nn
from torchsummary import summary
import torch

model = resnet18(True)
modules = list(model.children())[:4]
del model
model2 = nn.Sequential(*modules)
img = torch.rand(size = (1, 3, 448, 448))
img ='cuda:0')'cuda:0')
a = model2(img)

You can see that my model is absolutely tiny. But every time i tried to load it into cuda device with either to('cuda:0') or cuda(), the whole memory got filled regardless of the model size. And it became extremely slow and unresponsive.

I currently run Jetpack 4.6

Package: nvidia-jetpack
Version: 4.6-b199
Architecture: arm64
Maintainer: NVIDIA Corporation
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_28_22:34:44_PST_2021
Cuda compilation tools, release 10.2, V10.2.300
Build cuda_10.2_r440.TC440_70.29663091_0

and the code runs inside a container whose base is

I have added the user to video group and checked that torch.cuda.is_avaiable() returned True.

I really couldnt figure out what the problem was. Any input would be much appreciated.

img = torch.rand(size = (1, 3, 448, 448))
img ='cuda:0')

I tried loading only one image into GPU and this is the memory usage (collected by htop).


This looks like the behavior of the framework.

For better performance, the framework tends to load the underlying library and allocate workspace in the initial time.
In TensorFlow, there is a config to control the memory allocation fraction.
You can check if PyTorch has a similar flag or not.

In general, it’s more recommended to use TensorRT for inference on Jetson.
Its memory usage is more friendly for an embedded system.


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.