System info:
Device: Jetson Orin NX 16GB
Jetpack: 6.0-dp (freshly flashed, clean system)
CUDA: 12.2
torch: https://developer.download.nvidia.com/compute/redist/jp/v60dp/pytorch/torch-2.2.0a0+81ea7a4.nv24.01-cp310-cp310-linux_aarch64.whl (I tried using nv.24.02 as well)
MarkupSafe: MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
l4t-base: nvcr.io/nvidia/l4t-base:r36.2.0
Additional info:
- Using Hatch I am able to get "python3 -c “import torch; print(torch.cuda.is_available())” → True. (Hatch is basically just virtualenv in this case)
- The regular cpu arm64 version of torch installs fine, but torch.cuda.is_available() is False… of course.
- The output is from tmux with funky formatting. I tried to fix it a bit, but it may still be funky.
jet@ubuntu:~$ sudo docker run -it --runtime nvidia reg.companyname.com/nvidia/l4t-base:r36.2.0 bash → success
root@hostname:/# nvidia-smi → success
root@hostname:/# apt update && apt install -y python3-pip libopenblas-dev libopenmpi3 → success
root@hostname:/# pip3 install -i https://username:password@local.registry.with.torch.whl.com/simple torch → success
root@hostname:/# python3 -c “import torch; print(torch.cuda.is_available())” → failure
Traceback (most recent call last):
File “/usr/local/lib/python3.10/dist-packages/torch/init.py”, line 175, in _load_global_deps ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File “/usr/lib/python3.10/ctypes/init.py”, line 374, in init
self._handle = _dlopen(self._name, mode)
OSError: libcudart.so.12: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “”, line 1, in
File “/usr/local/lib/python3.10/dist-packages/torch/init.py”, line 235, in _load_global_deps()
File “/usr/local/lib/python3.10/dist-packages/torch/init.py”, line 196, in _load_global_deps_preload_cuda_deps(lib_folder, lib_name)
File “/usr/local/lib/python3.10/dist-packages/torch/init.py”, line 161, in _preload_cuda_deps
raise ValueError(f"{lib_name} not found in the system path {sys.path}")
ValueError: libcublas.so.*[0-9] not found in the system path [‘’, ‘/usr/lib/python310.zip’
, ‘/usr/lib/python3.10’, ‘/usr/lib/python3.10/lib-dynload’, ‘/usr/local/lib/python3.10/dis
t-packages’, ‘/usr/lib/python3/dist-packages’, ‘/usr/lib/python3.10/dist-packages’]
Questions (Really I’m just looking for general advice on how to proceed from here):
- This libcudart.so dl error appears to be from a cuda version mismatch. Is this a mismatch between the version of cuda pytorch was compiled for vs what’s on my JP6.0-dp system? JP6.0-dp uses CUDA12.2.
- Is installing pytorch manually from l4t-base even feasible or is l4t-pytorch for jp6.0 really the only option? Is that coming out soon?
- Should we give up on Docker and port to a system level deployment? This is possible, but a lot of effort and technical debt :(