Good evening.
I’m using the docker container nvcr.io/nvidia/pytorch:22.10-py3 (PyTorch | NVIDIA NGC) on my just flashed Jetson Orin. For running the container I use the next arguments:
docker run -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --runtime nvidia nvcr.io/nvidia/pytorch:22.10-py3 bash
The container initializes correctly and recognizes the NVIDIA Tegra driver. Once there I open a python3 console, and import torch and torch_vision and try to load a pretrained model and do a sample forward pass through the next simple script:
import torch
from torchvision.models import resnet50, ResNet50_Weights
device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2).to(device)
my_input = torch.rand(1, 3, 224, 224).to(device)
model(my_input)
I can confirm that cuda is available (torch.cuda.is_available() returns true), but in the last line of the script I get the next error:
>>> model(my_input)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torchvision/models/resnet.py", line 285, in forward
return self._forward_impl(x)
File "/opt/conda/lib/python3.8/site-packages/torchvision/models/resnet.py", line 268, in _forward_impl
x = self.conv1(x)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
I know that many times CUDNN_STATUS_INTERNAL_ERROR is due to lack of memory. If I run torch.cuda.get_device_properties(0).total_memory I get the 32Gb of my Orin’s DRAM.
Is there anything I’m doing wrong? Do I have to enable any other flag in the docker run instruction for the container to be available to access all the resources? May the error have any other cause?
Thank you very much
Regards,
Daniel