"torch.cuda" is not available inside the docker container which is running on Jetson Orin

rakesh.thykkoottathil.jay · September 22, 2023, 9:59am

I tried with FROM dustynv/pytorch:1.13-r35.3.1. Using torchvision==0.13.1 in my requirement file. I am facing same runtime error.

“File “/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py”, line 27, in decorate_context
return func(*args, **kwargs)
File “/data/models/torch/hub/ultralytics_yolov5_master/models/common.py”, line 721, in forward
y = non_max_suppression(y if self.dmb else y[0],
File “/data/models/torch/hub/ultralytics_yolov5_master/utils/general.py”, line 959, in non_max_suppression
i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS
File “/usr/local/lib/python3.8/dist-packages/torchvision/ops/boxes.py”, line 40, in nms
_assert_has_ops()
File “/usr/local/lib/python3.8/dist-packages/torchvision/extension.py”, line 33, in _assert_has_ops
raise RuntimeError(
RuntimeError: Couldn’t load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check GitHub - pytorch/vision: Datasets, Transforms and Models specific to Computer Vision for the compatibility matrix. Please check your PyTorch version with torch.version and your torchvision version with torchvision.version and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.”

dusty_nv · September 22, 2023, 1:04pm

Did you just try installing torchvision with pip? Because that will install torchvision wheel from PyPi that doesn’t have CUDA enabled (and also uninstall your previous PyTorch wheel as you found)

Instead, try building torchvision wheel from source with CUDA enabled like here:

github.com

dusty-nv/jetson-containers/blob/master/packages/pytorch/torchvision/Dockerfile

#---
# name: torchvision
# group: pytorch
# config: config.py
# depends: [cmake, pytorch]
# test: test.py
#---
ARG BASE_IMAGE
FROM ${BASE_IMAGE}

ARG TORCHVISION_VERSION
ARG TORCH_CUDA_ARCH_LIST="5.3;6.2;7.2;8.7"

RUN printenv && echo "torchvision version = $TORCHVISION_VERSION" && echo "TORCH_CUDA_ARCH_LIST = $TORCH_CUDA_ARCH_LIST"

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
            libjpeg-dev \
		  zlib1g-dev \
    && rm -rf /var/lib/apt/lists/* \

This file has been truncated. show original

rakesh.thykkoottathil.jay · September 22, 2023, 1:47pm

Not able to clone to the torchvision

Step 7/11 : RUN git clone --branch ${TORCHVISION_VERSION} --recursive --depth=1 GitHub - pytorch/vision: Datasets, Transforms and Models specific to Computer Vision torchvision && cd torchvision && git checkout ${TORCHVISION_VERSION} && python3 setup.py bdist_wheel && cp dist/torchvision*.whl /opt && pip3 install --no-cache-dir --verbose /opt/torchvision*.whl && cd …/ && rm -rf torchvision

—> Running in 3e61886a6a3e
Cloning into ‘torchvision’…
warning: Could not find remote branch 0.15 to clone.
fatal: Remote branch 0.15 not found in upstream origin

dusty_nv · September 22, 2023, 1:52pm

@rakesh.thykkoottathil.jay look at the tags available on https://github.com/pytorch/vision repo - substitute v0.15.1 for ${TORCHVISION_VERSION}

rakesh.thykkoottathil.jay · September 22, 2023, 5:04pm

@dusty_nv I installed the torchvision 0.15+cuda as you mentioned above.
But still facing some runtime issues.

File “/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py”, line 115, in decorate_context
return func(*args, **kwargs)
File “/data/models/torch/hub/ultralytics_yolov5_master/models/common.py”, line 721, in forward
y = non_max_suppression(y if self.dmb else y[0],
File “/data/models/torch/hub/ultralytics_yolov5_master/utils/general.py”, line 959, in non_max_suppression
i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS
File “/usr/local/lib/python3.8/dist-packages/torchvision/ops/boxes.py”, line 41, in nms
return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
File “/usr/local/lib/python3.8/dist-packages/torch/_ops.py”, line 502, in call
return self._op(*args, **kwargs or {})
NotImplementedError: Could not run ‘torchvision::nms’ with arguments from the ‘CUDA’ backend. This could be because the operator doesn’t exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit Internal Login for possible resolutions. ‘torchvision::nms’ is only available for these backends: [CPU, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

CPU: registered at /app/torchvision/torchvision/csrc/ops/cpu/nms_kernel.cpp:112 [kernel]
QuantizedCPU: registered at /app/torchvision/torchvision/csrc/ops/quantized/cpu/qnms_kernel.cpp:124 [kernel]
BackendSelect: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /opt/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:144 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at /opt/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:491 [backend fallback]
Functionalize: registered at /opt/pytorch/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:280 [backend fallback]
Named: registered at /opt/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at /opt/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:17 [backend fallback]
Negative: registered at /opt/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:19 [backend fallback]
ZeroTensor: registered at /opt/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:63 [backend fallback]
AutogradOther: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:30 [backend fallback]
AutogradCPU: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:34 [backend fallback]
AutogradCUDA: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:42 [backend fallback]
AutogradXLA: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:46 [backend fallback]
AutogradMPS: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:54 [backend fallback]
AutogradXPU: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:38 [backend fallback]
AutogradHPU: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:67 [backend fallback]
AutogradLazy: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:50 [backend fallback]
AutogradMeta: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:58 [backend fallback]
Tracer: registered at /opt/pytorch/pytorch/torch/csrc/autograd/TraceTypeManual.cpp:294 [backend fallback]
AutocastCPU: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:487 [backend fallback]
AutocastCUDA: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:354 [backend fallback]
FuncTorchBatched: registered at /opt/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:815 [backend fallback]
FuncTorchVmapMode: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at /opt/pytorch/pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1073 [backend fallback]
VmapMode: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at /opt/pytorch/pytorch/aten/src/ATen/functorch/TensorWrapper.cpp:210 [backend fallback]
PythonTLSSnapshot: registered at /opt/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:152 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at /opt/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:487 [backend fallback]
PythonDispatcher: registered at /opt/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:148 [backend fallback]

rakesh.thykkoottathil.jay · September 22, 2023, 5:16pm

@dusty_nv

I used TORCHVISION_VERSION=“v0.15.1” for cloning git repo
Above build has below torch and torchvision versions
Torch version: 2.0.0+nv23.05
Torchvision version : 0.15.1a0+42759b1’

dusty_nv · September 22, 2023, 5:24pm

Hmm, it would not appear it was built with CUDA. You should see stuff about that during the build if it’s enabled.

rakesh.thykkoottathil.jay · September 22, 2023, 5:27pm

Could you please guide me on this? I don’t know how to proceed. But torchvision version is showing Torchvision version : 0.15.1a0+42759b1’.
I think this means torchvision is CUDA enabled right??.

I used 0.15.1 as you mentioned.

rakesh.thykkoottathil.jay · September 22, 2023, 5:31pm

Build messages from the console. It starts with this:

“Building wheel torchvision-0.15.1a0+42759b1
Compiling extensions with following flags:
FORCE_CUDA: False
DEBUG: False
TORCHVISION_USE_PNG: True
TORCHVISION_USE_JPEG: True
TORCHVISION_USE_NVJPEG: True
TORCHVISION_USE_FFMPEG: True
TORCHVISION_USE_VIDEO_CODEC: True
NVCC_FLAGS:”

rakesh.thykkoottathil.jay · September 22, 2023, 5:35pm

One more build message which i found.

setup.py:10: DeprecationWarning: pkg_resources is deprecated as an API. See Package Discovery and Resource Access using pkg_resources - setuptools 68.2.2.post20230912 documentation
from pkg_resources import DistributionNotFound, get_distribution, parse_version
No CUDA runtime is found, using CUDA_HOME=‘/usr/local/cuda’
Emitting ninja build file /app/torchvision/build/temp.linux-aarch64-cpython-38/build.ninja…
Compiling objects…

dusty_nv · September 22, 2023, 5:39pm

Sorry, I’m not exactly sure why it happens. Do you see nvcc actually being invoked and used to compile torchvision’s .cu files during the build? You can also try using newer/older tags of torchvision version.

rakesh.thykkoottathil.jay · September 23, 2023, 4:37am

@dusty_nv I think i am missng the CUDA toolkit in the container. There is no message showing nvcc getting started for the compilation.

dusty_nv · September 23, 2023, 5:06pm

@rakesh.thykkoottathil.jay do you see it in the container under /usr/local/cuda ?

Also, try setting your default docker runtime to nvidia:

github.com

dusty-nv/jetson-containers/blob/master/docs/setup.md#docker-default-runtime

# System Setup

Install the latest version of JetPack 4 if you're on Nano/TX1/TX2, or JetPack 5 if you're on Xavier/Orin.  The following versions are supported:

* JetPack 4.6.1+ (>= L4T R32.7.1)
* JetPack 5.1+  (>= L4T R35.2.1)

> <sup>* Building on/for x86 platforms isn't supported at this time (one can typically install/run packages the upstream way there)</sup><br>
> <sup>* The below steps are optional for [pulling/running](/docs/run.md) existing container images from registry, but recommended for building containers locally.</sup>

## Clone the Repo

```bash
sudo apt-get update && sudo apt-get install git python3-pip
git clone https://github.com/dusty-nv/jetson-containers
cd jetson-containers
pip3 install -r requirements.txt
```

## Docker Default Runtime

This file has been truncated. show original

Unless you explicitly set TORCH_CUDA_ARCH_LIST maybe it tries to use CUDA runtime during builds to detect your CUDA SM version, and setting your default docker runtime to nvidia will enable CUDA to be run during builds (i.e. actually using the GPU as opposed to just compiling with nvcc)

rakesh.thykkoottathil.jay · September 24, 2023, 4:46pm

@dusty_nv I cannot set docker runtime to nvidia. I am actually building the image using azure container registry build command.

az acr build --registry registry_name --file Dockerfile --platform linux/arm64/v8 --image imagename & tag .

I am setting the runtime to nvidia (In config) when i am pushing the container to Azure IoT edge runtime(in Jetson) through Azure IoT hub.

Here during the build i dont have much flexibility to set that.

rakesh.thykkoottathil.jay · September 24, 2023, 4:53pm

Under user/local
usr/local# ls
bin cuda cuda-11 cuda-11.4 etc games include lib man sbin share src
root:/usr/local# cd cuda-11.4
root:/usr/local/cuda-11.4# ls
DOCS EULA.txt README bin compute-sanitizer extras include lib64 nvml nvvm samples share targets tools version.json
root@:/usr/local/cuda-11.4# cd bin/
root@:/usr/local/cuda-11.4/bin# cd nvcc
bash: cd: nvcc: Not a directory
root@:/usr/local/cuda-11.4/bin# ls
bin2c compute-sanitizer crt cu++filt cuda-gdb cuda-gdbserver cuda-install-samples-11.4.sh cudafe++ cuobjdump fatbinary nvcc nvcc.profile nvdisasm nvlink nvprune ptxas

dusty_nv · September 25, 2023, 12:54pm

Oh, some packages actually need GPU access during the build. I would try it first on actual Jetson hardware to get the container working.

rakesh.thykkoottathil.jay · September 25, 2023, 3:55pm

I used the ENV variable FORCE_CUDA =1 to build it with nvcc
Now i can run the inference.

system · October 25, 2023, 5:47am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Installing torchvision inside docker image Jetson TX2 docker	7	3611	October 18, 2021
CUDA runtime on Jetson Orin AGX Jetson AGX Orin cuda	46	6940	September 1, 2023
Hello AI World for Jetpack 6.0 DP - Pytorch 2.1.0 Installed, Torchvision Did Not Install Jetson Orin Nano pytorch	24	2012	January 15, 2024
Cannot get Torch met CUDA to work on Jetson Orin Jetson Orin NX pytorch	10	2573	February 26, 2024
Pytorch compatibility issues (torch 2.0.0+nv23.5 && torchvision 0.15.1) Jetson Orin NX pytorch	10	17739	June 13, 2023
'torchvision.version' has no attribute named 'cuda' Jetson AGX Orin cuda	4	534	October 17, 2023
Torchvision will not import into Python after jetson-inference build of PyTorch Jetson Nano jetson-inference , pytorch , python	15	2024	October 18, 2021
Issue to install pytorch on Jetson Orin platform Jetson Orin NX jetson-inference , pytorch	10	6438	June 7, 2023
How can I create my custom containers for Jetson Nano Jetson Nano docker , containers	23	3382	October 15, 2021
Installing torch and torchvision in l4t-jetpack based docker image on Jetson Xavier NX Jetson Xavier NX docker	4	2398	November 28, 2022

"torch.cuda" is not available inside the docker container which is running on Jetson Orin

Related topics