CUDA error: no kernel image is available for execution on the device

I got a another error on AGX Orin r35.1. I’m running inside of the default container: l4t-pytorch:r35.1.0-pth1.11-py3. I tried on l4t-pytorch:r35.1.0-pth1.12-py3 too but no success.

What’s wrong ? Any tips ?

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
Adding AutoShape... 
Traceback (most recent call last):
  File "test_yolo.py", line 10, in <module>
    results = model(img)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1129, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/root/.cache/torch/hub/ultralytics_yolov5_master/models/common.py", line 642, in forward
    y = non_max_suppression(y if self.dmb else y[0],
  File "/root/.cache/torch/hub/ultralytics_yolov5_master/utils/general.py", line 885, in non_max_suppression
    i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS
  File "/usr/local/lib/python3.8/dist-packages/torchvision-0.13.0a0+da3794e-py3.8-linux-aarch64.egg/torchvision/ops/boxes.py", line 41, in nms
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
  File "/usr/local/lib/python3.8/dist-packages/torch/_ops.py", line 142, in __call__
    return self._op(*args, **kwargs or {})
**RuntimeError: CUDA error: no kernel image is available for execution on the device**
**CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.**
**For debugging consider passing CUDA_LAUNCH_BLOCKING=1.**

My env:
-----------------------
python3
Python 3.8.10 (default, Jun 22 2022, 20:18:18)  [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import cv2
>>> cv2.__version__
'4.5.0'
>>> import torch
>>> print("PyTorch: "+torch.__version__)
PyTorch: 1.12.0a0+8a1a93a9.nv22.5
>>> import torchvision
>>> print("PyTorch: "+torchvision.__version__)
PyTorch: 0.13.0a0+da3794e



/usr/src/app/yolov5# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_May__4_00:02:26_PDT_2022
Cuda compilation tools, release 11.4, V11.4.239
Build cuda_11.4.r11.4/compiler.31294910_0/


pip3 list
Package                 Version                 
----------------------- ------------------------
absl-py                 1.2.0                   
appdirs                 1.4.4                   
asttokens               2.0.8                   
backcall                0.2.0                   
cachetools              5.2.0                   
certifi                 2022.6.15               
cffi                    1.15.1                  
chardet                 3.0.4                   
charset-normalizer      2.1.0                   
cmake                   3.22.3                  
cycler                  0.11.0                  
Cython                  0.29.32                 
dbus-python             1.2.16                  
decorator               5.1.1                   
distro                  1.7.0                   
executing               1.0.0                   
fonttools               4.37.1                  
google-auth             2.11.0                  
google-auth-oauthlib    0.4.6                   
graphsurgeon            0.4.6                   
grpcio                  1.48.1                  
idna                    3.3                     
importlib-metadata      4.12.0                  
ipython                 8.4.0                   
jedi                    0.18.1                  
kiwisolver              1.4.4                   
Mako                    1.2.1                   
Markdown                3.4.1                   
MarkupSafe              2.1.1                   
matplotlib              3.5.3                   
matplotlib-inline       0.1.6                   
ninja                   1.10.2.3                
numpy                   1.23.2                  
oauthlib                3.2.0                   
packaging               21.3                    
pandas                  1.4.4                   
parso                   0.8.3                   
pexpect                 4.8.0                   
pickleshare             0.7.5                   
Pillow                  9.2.0                   
pip                     20.0.2                  
platformdirs            2.5.2                   
prompt-toolkit          3.0.31                  
protobuf                3.19.4                  
psutil                  5.9.1                   
ptyprocess              0.7.0                   
pure-eval               0.2.2                   
pyasn1                  0.4.8                   
pyasn1-modules          0.2.8                   
pycparser               2.21                    
pycuda                  2022.1                  
Pygments                2.13.0                  
PyGObject               3.36.0                  
pyparsing               3.0.9                   
PySoundFile             0.9.0.post1             
python-apt              2.0.0+ubuntu0.20.4.7    
python-dateutil         2.8.2                   
pytools                 2022.1.12               
pytz                    2022.2.1                
PyYAML                  6.0                     
requests                2.28.1                  
requests-oauthlib       1.3.1                   
requests-unixsocket     0.2.0                   
rsa                     4.9                     
scikit-build            0.15.0                  
scipy                   1.9.1                   
seaborn                 0.11.2                  
setuptools              45.2.0                  
six                     1.14.0                  
stack-data              0.5.0                   
tensorboard             2.10.0                  
tensorboard-data-server 0.6.1                   
tensorboard-plugin-wit  1.8.1                   
tensorrt                8.4.1.5                 
thop                    0.1.1.post2207130030    
torch                   1.12.0a0+8a1a93a9.nv22.5
torchaudio              0.12.0+2e13884          
torchvision             0.13.0a0+da3794e        
tqdm                    4.64.0                  
traitlets               5.3.0                   
typing-extensions       4.3.0                   
uff                     0.6.9                   
urllib3                 1.26.11                 
wcwidth                 0.2.5                   
Werkzeug                2.2.2                   
wheel                   0.34.2                  
zipp                    3.8.1   
1 Like

Hi @tnferreira, sorry about that, I believe this patch in the container build script should fix it: https://github.com/dusty-nv/jetson-containers/commit/cb5cb791a6e020d3f9ad09854500698f074e52d2

You can try rebuilding the container locally by setting your default docker runtime to nvidia, and then running the following:

git clone https://github.com/dusty-nv/jetson-containers
cd jetson-containers
scripts/docker_build_ml.sh pytorch

If you only want PyTorch v1.11, v1.12, ect you can comment out the other containers here: https://github.com/dusty-nv/jetson-containers/blob/39496f8eba51ababb0cfb625fc70410163e4fe43/scripts/docker_build_ml.sh#L139
Then it will build them faster.

1 Like

@dusty_nv I have same error trying to run yolov5 inside docker. Have Orin 5.02, it works outside of docker with:

torch.__version__
'1.13.0a0+08820cb0.nv22.07'
 torchvision.__version__
'0.13.0'

But trying:

ARG BASE_IMAGE=nvcr.io/nvidia/l4t-base:r35.1.0
ARG BASE_IMAGE=nvcr.io/nvidia/l4t-pytorch:r35.1.0-pth1.13-py3
ARG BASE_IMAGE=l4t-pytorch:r35.1.0-pth1.13-py3

produces the same:

File "/usr/local/lib/python3.8/dist-packages/torchvision-0.13.0a0+da3794e-py3.8-linux-aarch64.egg/torchvision/ops/boxes.py", line 41, in nms
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
  File "/usr/local/lib/python3.8/dist-packages/torch/_ops.py", line 143, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Hi @i_love_nvidia, I’ve rebuilt the l4t-pytorch:r35.1.0 containers with the torchvision fix for Orin, and put them on DockerHub here:

dustynv/l4t-pytorch:r35.1.0-pth1.11-py3
dustynv/l4t-pytorch:r35.1.0-pth1.12-py3
dustynv/l4t-pytorch:r35.1.0-pth1.13-py3

Can you try using one of those instead?

1 Like

I tried

dustynv/l4t-pytorch:r35.1.0-pth1.13-py3

and it works! Thanks!

What did you change?

This was the PR - fixed CUDA sm_87 for L4T R35 · dusty-nv/jetson-containers@cb5cb79 · GitHub

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.