Unable to perform inference with PyTorch through docker container nvcr.io/nvidia/pytorch:22.10-py3 on Jetson Orin

dgandiaga92 · November 14, 2022, 4:22pm

Good evening.
I’m using the docker container nvcr.io/nvidia/pytorch:22.10-py3 (PyTorch | NVIDIA NGC) on my just flashed Jetson Orin. For running the container I use the next arguments:

docker run -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --runtime nvidia nvcr.io/nvidia/pytorch:22.10-py3 bash

The container initializes correctly and recognizes the NVIDIA Tegra driver. Once there I open a python3 console, and import torch and torch_vision and try to load a pretrained model and do a sample forward pass through the next simple script:

import torch
from torchvision.models import resnet50, ResNet50_Weights

device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2).to(device)
my_input = torch.rand(1, 3, 224, 224).to(device)
model(my_input)

I can confirm that cuda is available (torch.cuda.is_available() returns true), but in the last line of the script I get the next error:

>>> model(my_input)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torchvision/models/resnet.py", line 285, in forward
    return self._forward_impl(x)
  File "/opt/conda/lib/python3.8/site-packages/torchvision/models/resnet.py", line 268, in _forward_impl
    x = self.conv1(x)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

I know that many times CUDNN_STATUS_INTERNAL_ERROR is due to lack of memory. If I run torch.cuda.get_device_properties(0).total_memory I get the 32Gb of my Orin’s DRAM.
Is there anything I’m doing wrong? Do I have to enable any other flag in the docker run instruction for the container to be available to access all the resources? May the error have any other cause?
Thank you very much

Regards,
Daniel

dusty_nv · November 14, 2022, 6:39pm

Hi @dgandiaga92, please use the l4t-pytorch container for Jetson instead:

dgandiaga92 · November 15, 2022, 10:17am

Hi dusty_nv, thank you for your answer.
The thing is that I’m missing some libraries in l4t-pytorch:r35.1.0-pth1.13-py3 that were present in pytorch:22.10-py3. Specifically Torch-TensorRT (Torch-TensorRT — Torch-TensorRT master documentation), that I’m having problems to compile from source on my own on a l4t-pytorch container. Isn’t there any other alternative?

dusty_nv · November 15, 2022, 2:42pm

The Dockerfiles and build scripts to the l4t-pytorch containers are found here if you want to modify them and rebuild: https://github.com/dusty-nv/jetson-containers

You might find torch2trt similar and easier to install: https://github.com/NVIDIA-AI-IOT/torch2trt

dgandiaga92 · November 15, 2022, 2:47pm

Thanks, but what I’d need is the dockerfile and build script of pytorch:22.10-py3 so I can see how it installs torch_tensorrt and replicate it on a container based on l4t. Regarding torch2trt I’ve already tried it and the output of the compiled model was complete noise, in opposite of when using torch_tensorrt where the output was exactly the same than with the original model.

dusty_nv · November 15, 2022, 3:34pm

There are other NVIDIA dockerfiles here (https://gitlab.com/nvidia/container-images), but unfortunately those don’t seem to include the non-L4T pytorch container. I’m not familiar with installing Torch-TensorRT myself, so what I’d recommend is posting the errors you are getting building it to the Torch-TensorRT GitHub repo here: https://github.com/pytorch/TensorRT/issues

system · November 29, 2022, 3:34pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Failed to run Pytorch NGC docker on Jetson nano Jetson Nano docker , pytorch	7	1074	June 23, 2022
Issue Building a Custom PyTorch Docker Image on Nvidia Jetson AGX Orin Jetson AGX Orin pytorch , containers	10	68	December 31, 2024
Jetson Orin, TensorRT, CUDA 11.8 for PyTorch 2.0.0 Jetson AGX Orin pytorch	5	1498	November 28, 2023
Jetson AGX Orin don't work cuda with pytorch Jetson AGX Orin pytorch , cudnn	4	75	September 16, 2024
Pytorch installed on l4t-jetpack:r35.4.1 container on Jetson Orin Nano (JetPack 6.0 Developer Kit) fails to recognize CUDA Jetson Orin Nano cuda , docker , pytorch , python , containers	2	70	October 22, 2024
Trouble trying to install torch in Docker container on JP6.0-dp Jetson Orin NX docker , pytorch	2	400	March 21, 2024
Torch tensorrt installation failed Jetson AGX Orin pytorch	2	139	July 12, 2024
YOLOv8 Python Script has really high inference time due unused GPU Memory Jetson Orin NX cuda , pytorch , cudnn	4	587	March 20, 2024
How to install Pytorch and torchvison on NVIDIA L4T Base from NGC catalog Jetson AGX Orin pytorch	4	1429	June 13, 2023
Docker run failed on orin developer kit Jetson Orin NX	10	1001	October 25, 2023

Unable to perform inference with PyTorch through docker container nvcr.io/nvidia/pytorch:22.10-py3 on Jetson Orin

Related topics