Using different pieces of code here and there, I sometimes get this error message on my jetson Xavier AGX (I use the same codes on a jetson nano but never have this error): RuntimeError: CUDA error: no kernel image is available for execution on the device
which leads me to think that there is a problem with the torch installation.
I tried the version of pytorch 1.7.0 and 1.8.0 with no success (meaning they are installed correctly according to the verification steps, but give me this error), so I thought I would try to build it from source.
I have L4T 32.5.1 so I’m wondering, should I apply one of the patches you provide before attempting to build torch from source (for compatibility with the code I’m trying to use, my goal is to build pytorch 1.7) ?
(venv) jetson-nano@jetsonnano-desktop:~$ pip3 install torch-1.6.0-cp36-cp36m-linux_aarch64.whl
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
ERROR: torch-1.6.0-cp36-cp36m-linux_aarch64.whl is not a supported wheel on this platform.
so,what measures should i do to solve this problem???
Hi,@dusty_nv
In Xavier AGX, I installed libTorch with “xxx.whl” from Box.
But I cmake a project or compile project with QT creator 4.5.2, both encountered problems.
List item my device only has cuda 10.2,but it link library to cuda10.0.I guess the problem is libTorch cmake files.
QT compile,report header file problem.
Thank you in advance!
>>> import torch
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/nvidia/.local/lib/python3.6/site-packages/torch/__init__.py", line 135, in <module>
_load_global_deps()
File "/home/nvidia/.local/lib/python3.6/site-packages/torch/__init__.py", line 93, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libcudart.so.10.2: cannot open shared object file: No such file or directory
Hi @2570868576, it looks like you have installed a PyTorch wheel built against a newer version of JetPack. Can you try one of these wheels built for JetPack 4.3?
Also, before you do that, run pip3 uninstall torch to uninstall the previous wheel you installed.
Hi @yyjqr789, which version of JetPack are you running on your AGX Xavier? The wheel you downloaded is for JetPack 4.2 or 4.3.
Regarding the code error you got, unfortunately I’m not familiar with using libtorch directly and haven’t seen that error before. Is the code you are trying to compile expect a different version of PyTorch?
Hi @l.weingart, are you sure that it is PyTorch which is throwing this error? These wheels were built on Xavier with TORCH_CUDA_ARCH_LIST="5.3;6.2;7.2" (meaning it compile CUDA kernels for Nano/TX1/TX2/Xavier). You can set the same variable when you compile torchvision too.
If you compile PyTorch yourself for Jetson, yes you should apply the patch (i.e. the 1.7 patch for PyTorch 1.7) and remember to set the environment variables too.
Thank you for your reply.
Actually I went on a built it with the patch.
I have to admit that now that the build is completed, I’m a bit at loss as to how to install it.
Could you please help with the next step, how to install this now that it has complied successfully ?
Sure thing - the built wheel should be under pytorch/distdirectory. Uninstall your previous PyTorch install with pip3 uninstall torch and then install this wheel instead.
No, I’m not sure, and to be frank, after having compiled it on the Xavier, I don’t think so.
Here is my problem: I bought a Jetson Nano in December and I successfully installed tools to detect human posture in videos and everything works well on it.
Then in January I bought a Jetson Xavier, tried to install the same tool suite but every time I try to use it it ends up in segmentation fault.
The tool suite is mmpose, from open-mmlab.
From the issue I opened on their github, it was hinted that the problem was coming from torch, but not for certain either.
Also, when I search for RuntimeError: CUDA error: no kernel image is available for execution on the device on Google, results are often in relation to torch (even though it doesn’t mean much, I agree :-p ).
I reinstalled everything on the Xavier from scratch, reflashed the system, the jetpack, etc, impossible to make it work as it does on the Nano.
I’m at loss for ideas.
I’m pretty sure this is not the right thread to discuss this, but I wanted to reply to your question.
It looks like mmcv compiles CUDA kernels. I’m not familiar with these projects, but try setting MMCV_CUDA_ARGS='-gencode=arch=compute_72,code=sm_72' before you install mmcv
I used 1.8.0 on Xavier NX,it installed ok right now.And Run OK. Thank you!
BUT In QT,there also encounters some errors,same as mentioned before. I will check.
I rebuilt mmcv using MMCV_CUDA_ARGS='-gencode=arch=compute_72,code=sm_72' and the runtime error RuntimeError: CUDA error: no kernel image is available for execution on the device disappeared.
Thank you very much!
However, using mmpose still ends up in segmentation fault.
I’ll keep working with them to try and isolate the error.
I’m puzzled because it works just fine on the Nano… :-p
Hi, I’m trying to put together a L4T Base container to simulate a TX2 using JetPack 4.3 and PyTorch 1.1 with CUDA 10.0.
To do so, I’ve tried to start from a nvcr.io/nvidia/l4t-base:r32.3.1 image and I’ve run into several issues. Here’s my Dockerfile:
deb https://repo.download.nvidia.com/jetson/common r32 main
deb https://repo.download.nvidia.com/jetson/t186 r32 main
Trying to import PyTorch in the container results in an error:
Python 3.6.9 (default, Jan 26 2021, 15:33:00)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 79, in <module>
from torch._C import *
ImportError: libnvToolsExt.so.1: cannot open shared object file: No such file or directory
I’ve checked out nvidia:l4t-pytorch containers, but they require JetPack 4.4 or newer.
Do you have any idea how to create such a container?
Thanks !
Hi @robin.blanchard00, l4t-base already includes the CUDA/cuDNN libraries in it (these are mounted from the host at runtime when --runtime nvidia is used during docker run). So I would skip installing all that CUDA stuff into the container and see if that helps. Then just run it with --runtime nvidia
Yep that’s what I was doing before. Here’s the Dockerfile then:
FROM nvcr.io/nvidia/l4t-base:r32.3.1
COPY nvidia-l4t-apt-source.list /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
RUN apt-key adv --fetch-key https://repo.download.nvidia.com/jetson/jetson-ota-public.asc
# Update, upgrade and install basics
RUN apt-get update -y
RUN apt-get install -y apt-utils git curl ca-certificates bzip2 cmake tree htop bmon iotop g++ \
&& apt-get install -y libglib2.0-0 libsm6 libxext6 libxrender-dev nano wget python3-pip pkg-config ffmpeg
RUN python3 -m pip install --upgrade pip
ENV NVIDIA_VISIBLE_DEVICES=all
# Install PyTorch and TorchVision
# Taken from https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-8-0-now-available/72048
RUN wget https://nvidia.box.com/shared/static/mmu3xb3sp4o8qg9tji90kkxl1eijjfc6.whl -O torch-1.1.0-cp36-cp36m-linux_aarch64.whl \
&& apt-get -y install python3-pip libopenblas-base libopenmpi-dev \
&& python3 -m pip install Cython \
&& python3 -m pip install numpy torch-1.1.0-cp36-cp36m-linux_aarch64.whl
Running with --runtime nvidia and importing torch results in:
>>> import torch
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 79, in <module>
from torch._C import *
ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory