Overview: I have a minimal example of a code snippet that can run directly on the OS, but not in docker. When I run the below code snippet I get CUDNN_STATUS_NOT_INITIALIZED inside the docker container, but not on the OS.
What I tried already (I have already spent 3-4 days trying to fix this):
- JP 6.1 and JP 6.0
- Compiling torch 2.3.1 and 2.5.0 on the Jetson myself (using different images from /nvidia/cuda on Dockerhub)
- I tried on an Orin NX as well
- Changing different numpy, numba versions
Here’s “dpkg -l | grep cuda” from inside the broken environment:
ii cuda-cccl-12-2 12.2.140-1 arm64 CUDA CCCL
ii cuda-command-line-tools-12-2 12.2.2-1 arm64 CUDA command-line tools
ii cuda-compiler-12-2 12.2.2-1 arm64 CUDA compiler
ii cuda-crt-12-2 12.2.140-1 arm64 CUDA crt
ii cuda-cudart-12-2 12.2.140-1 arm64 CUDA Runtime native Libraries
ii cuda-cudart-dev-12-2 12.2.140-1 arm64 CUDA Runtime native dev links, headers
ii cuda-cuobjdump-12-2 12.2.140-1 arm64 CUDA cuobjdump
ii cuda-cupti-12-2 12.2.142-1 arm64 CUDA profiling tools runtime libs.
ii cuda-cupti-dev-12-2 12.2.142-1 arm64 CUDA profiling tools interface.
ii cuda-cuxxfilt-12-2 12.2.140-1 arm64 CUDA cuxxfilt
ii cuda-driver-dev-12-2 12.2.140-1 arm64 CUDA Driver native dev stub library
ii cuda-gdb-12-2 12.2.140-1 arm64 CUDA-GDB
ii cuda-keyring 1.0-1 all GPG keyring for the CUDA repository
ii cuda-libraries-12-2 12.2.2-1 arm64 CUDA Libraries 12.2 meta-package
ii cuda-libraries-dev-12-2 12.2.2-1 arm64 CUDA Libraries 12.2 development meta-package
ii cuda-minimal-build-12-2 12.2.2-1 arm64 Minimal CUDA 12.2 toolkit build packages.
ii cuda-nsight-compute-12-2 12.2.2-1 arm64 NVIDIA Nsight Compute
ii cuda-nvcc-12-2 12.2.140-1 arm64 CUDA nvcc
ii cuda-nvdisasm-12-2 12.2.140-1 arm64 CUDA disassembler
ii cuda-nvml-dev-12-2 12.2.140-1 arm64 NVML native dev links, headers
ii cuda-nvprune-12-2 12.2.140-1 arm64 CUDA nvprune
ii cuda-nvrtc-12-2 12.2.140-1 arm64 NVRTC native runtime libraries
ii cuda-nvrtc-dev-12-2 12.2.140-1 arm64 NVRTC native dev links, headers
ii cuda-nvtx-12-2 12.2.140-1 arm64 NVIDIA Tools Extension
ii cuda-nvvm-12-2 12.2.140-1 arm64 CUDA nvvm
ii cuda-profiler-api-12-2 12.2.140-1 arm64 CUDA Profiler API
ii cuda-sanitizer-12-2 12.2.140-1 arm64 CUDA Sanitizer
ii cuda-toolkit-12-2-config-common 12.2.140-1 all Common config package for CUDA Toolkit 12.2.
ii cuda-toolkit-12-config-common 12.2.140-1 all Common config package for CUDA Toolkit 12.
ii cuda-toolkit-config-common 12.2.140-1 all Common config package for CUDA Toolkit.
hi libcudnn8 8.9.6.50-1+cuda12.2 arm64 cuDNN runtime libraries
ii libcudnn8-dev 8.9.6.50-1+cuda12.2 arm64 cuDNN development libraries and headers
hi libnccl-dev 2.19.3-1+cuda12.2 arm64 NVIDIA Collective Communication Library (NCCL) Development Files
hi libnccl2 2.19.3-1+cuda12.2 arm64 NVIDIA Collective Communication Library (NCCL) Runtime
Hardware: AGX Orin 64GB
JP: 6.0 rev 2
OS: Ubuntu 22.04
Minimal example code snippet:
import numpy as np
import torch as th
import torch.fft
import torch.nn.functional as F
from scipy.ndimage._filters import _gaussian_kernel1d
def cuda_downsample(th_img, factor=2):
gaussian_kernel = _gaussian_kernel1d(sigma=factor * 0.5, order=0, radius=int(4*factor * 0.5 + 0.5))[::-1].copy()
th_gaussian_kernel = th.as_tensor(gaussian_kernel, dtype=th.float32, device="cuda")
temp = F.conv2d(th_img, th_gaussian_kernel[None,None, :, None]) # convolve y
th_filteredImage = F.conv2d(temp, th_gaussian_kernel[None, None, None, :]) # convolve x
h2, w2 = np.floor(np.array(th_filteredImage.shape[2:]) / float(factor)).astype(int)
return th_filteredImage[:, :, :h2 * factor:factor, :w2 * factor:factor]
def main():
img = np.zeros(3024*4032, dtype=np.uint32)
img = np.reshape(img, (1, 1, 3024, 4032))
torch_img_grey = th.as_tensor(img, dtype=th.float32, device="cuda")
torch_img_grey = cuda_downsample(torch_img_grey)
if __name__ == "__main__":
main()
Additional notes:
- devicequery would not compile. I didn’t investigate this further.
- Python packages for torch, torchvision, numpy, numba, scipy are the same.
- Jetson is freshly flashed and I didn’t change anything on it aside from install some basic packages via apt.
- These devices are air gapped and dont have access to the internet
Env 1 (code runs as expected):
Running directly on the OS
Env 2 (code does not run):
Dockerfile:
The base image is from the following command:
docker pull --platform=linux/arm64 nvidia/cuda:12.2.2-cudnn8-devel-ubuntu22.04
The torch and torchvision wheels are from this post and are the matching versions for JP 6.0 cuda 12.2
FROM custom/reg/cuda:12.2.2-cudnn8-devel-ubuntu22.04
RUN apt update && \
apt install -y build-essential libopenblas-base libopenmpi-dev libomp-dev python3 vim python3-pip
RUN pip3 install "numba==0.60.0" "numpy==1.23.4" "scipy==1.10.0"
RUN pip3 install -i custom/reg/simple torch torchvision
COPY . /test
WORKDIR /test
Env1 collect_env
/usr/lib/python3.10/runpy.py:126: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
Collecting environment information...
PyTorch version: 2.3.0
Is debug build: False
CUDA used to build PyTorch: 12.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35
Python version: 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.136-tegra-aarch64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.2.140
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Orin (nvgpu)
Nvidia driver version: N/A
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.8.9.4
/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.9.4
/usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.9.4
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.9.4
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.9.4
/usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.9.4
/usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.9.4
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-7
Off-line CPU(s) list: 8-11
Vendor ID: ARM
Model name: Cortex-A78AE
Model: 1
Thread(s) per core: 1
Core(s) per cluster: 4
Socket(s): -
Cluster(s): 2
Stepping: r0p1
CPU max MHz: 2201.6001
CPU min MHz: 115.2000
BogoMIPS: 62.50
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcpc flagm paca pacg
L1d cache: 512 KiB (8 instances)
L1i cache: 512 KiB (8 instances)
L2 cache: 2 MiB (8 instances)
L3 cache: 4 MiB (2 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-7
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; CSV2, but not BHB
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.23.4
[pip3] onnx-graphsurgeon==0.3.12
[pip3] torch==2.3.0
[pip3] torchvision==0.18.0a0+6043bc2
[conda] Could not collect
Env 2 collect_env:
/usr/lib/python3.10/runpy.py:126: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
Collecting environment information...
PyTorch version: 2.3.0
Is debug build: False
CUDA used to build PyTorch: 12.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35
Python version: 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.136-tegra-aarch64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.2.140
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Orin (nvgpu)
Nvidia driver version: N/A
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.8.9.6
/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.9.6
/usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.9.6
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.9.6
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.9.6
/usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.9.6
/usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.9.6
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-7
Off-line CPU(s) list: 8-11
Vendor ID: ARM
Model name: Cortex-A78AE
Model: 1
Thread(s) per core: 1
Core(s) per cluster: 4
Socket(s): -
Cluster(s): 2
Stepping: r0p1
CPU max MHz: 2201.6001
CPU min MHz: 115.2000
BogoMIPS: 62.50
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcpc flagm paca pacg
L1d cache: 512 KiB (8 instances)
L1i cache: 512 KiB (8 instances)
L2 cache: 2 MiB (8 instances)
L3 cache: 4 MiB (2 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-7
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; CSV2, but not BHB
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.23.4
[pip3] torch==2.3.0
[pip3] torchvision==0.18.0a0+6043bc2
[conda] Could not collect
SDK Manager about: