Hi. I have a rather weird problem when I try loading a pytorch model into GPUs.
import torch.nn as nn
from torchsummary import summary
import torch
model = resnet18(True)
modules = list(model.children())[:4]
del model
model2 = nn.Sequential(*modules)
print(torch.cuda.is_available())
img = torch.rand(size = (1, 3, 448, 448))
img = img.to('cuda:0')
model2.to('cuda:0')
a = model2(img)
print(a.shape)
You can see that my model is absolutely tiny. But every time i tried to load it into cuda device with either to('cuda:0')
or cuda()
, the whole memory got filled regardless of the model size. And it became extremely slow and unresponsive.
I currently run Jetpack 4.6
Package: nvidia-jetpack
Version: 4.6-b199
Architecture: arm64
Maintainer: NVIDIA Corporation
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_28_22:34:44_PST_2021
Cuda compilation tools, release 10.2, V10.2.300
Build cuda_10.2_r440.TC440_70.29663091_0
and the code runs inside a container whose base is nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.9-py3
.
I have added the user to video
group and checked that torch.cuda.is_avaiable()
returned True.
I really couldnt figure out what the problem was. Any input would be much appreciated.
Thanks.