Error RuntimeError: CUDA error: no kernel image is available for execution on the device when doing != operation on Jetson orin agx

I am using Jetpack 6.2 with cuda12.4 on a Jetson orin agx developer kit. I am able to put tensors to the device, but for some reason I am getting an error when trying to do the != operation.

I am using python 3.11 and I installed torch with the following

pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu124

If I run nvcc --version I get the following

eisenbnt@ubuntu:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:18:46_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

Here is some python that gives the error.

>>> import torch
>>> x = torch.tensor([0, 1, 1]).to(0)
>>> print(x.device)
cuda:0
>>>
>>>
>>> x != 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
>>>
>>>
>>> x != torch.tensor(0).to(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
>>>
>>>
>>> x != torch.tensor([0, 0, 0]).to(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
>>>
>>> torch.version.cuda
'12.4'
>>>
>>> torch.cuda.is_available()
True

1 Like

Hi,

Usually, the default PyTorch package doesn’t build with Jetson support.

Please use the CUDA 12.6 instead (the original version in JetPack 6.2).
And install the PyTorch package from the below link:

https://pypi.jetson-ai-lab.dev/jp6/cu126

Thanks.

I just followed you suggestion. Now when I import torch I get the following warning message.

(base) eisenbnt@ubuntu:~$ python
Python 3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.2.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<stdin>", line 1, in <module>
  File "/home/eisenbnt/.venv310/base/lib/python3.10/site-packages/torch/__init__.py", line 2196, in <module>
    from torch import quantization as quantization  # usort: skip
  File "/home/eisenbnt/.venv310/base/lib/python3.10/site-packages/torch/quantization/__init__.py", line 2, in <module>
    from .fake_quantize import *  # noqa: F403
  File "/home/eisenbnt/.venv310/base/lib/python3.10/site-packages/torch/quantization/fake_quantize.py", line 10, in <module>
    from torch.ao.quantization.fake_quantize import (
  File "/home/eisenbnt/.venv310/base/lib/python3.10/site-packages/torch/ao/quantization/__init__.py", line 12, in <module>
    from .pt2e._numeric_debugger import (  # noqa: F401
  File "/home/eisenbnt/.venv310/base/lib/python3.10/site-packages/torch/ao/quantization/pt2e/_numeric_debugger.py", line 9, in <module>
    from torch.export import ExportedProgram
  File "/home/eisenbnt/.venv310/base/lib/python3.10/site-packages/torch/export/__init__.py", line 68, in <module>
    from .decomp_utils import CustomDecompTable
  File "/home/eisenbnt/.venv310/base/lib/python3.10/site-packages/torch/export/decomp_utils.py", line 5, in <module>
    from torch._export.utils import (
  File "/home/eisenbnt/.venv310/base/lib/python3.10/site-packages/torch/_export/__init__.py", line 48, in <module>
    from .wrappers import _wrap_submodules
  File "/home/eisenbnt/.venv310/base/lib/python3.10/site-packages/torch/_export/wrappers.py", line 7, in <module>
    from torch._higher_order_ops.strict_mode import strict_mode
  File "/home/eisenbnt/.venv310/base/lib/python3.10/site-packages/torch/_higher_order_ops/__init__.py", line 1, in <module>
    from torch._higher_order_ops.cond import cond
  File "/home/eisenbnt/.venv310/base/lib/python3.10/site-packages/torch/_higher_order_ops/cond.py", line 9, in <module>
    import torch._subclasses.functional_tensor
  File "/home/eisenbnt/.venv310/base/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py", line 45, in <module>
    class FunctionalTensor(torch.Tensor):
  File "/home/eisenbnt/.venv310/base/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py", line 275, in FunctionalTensor
    cpu = _conversion_method_template(device=torch.device("cpu"))
/home/eisenbnt/.venv310/base/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:275: UserWarning: Failed to initialize NumPy: 
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.2.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

 (Triggered internally at /opt/pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
  cpu = _conversion_method_template(device=torch.device("cpu"))
>>> 
>>> 
>>> x = torch.tensor([0, 1]).to(0)
>>> x != 0
tensor([False,  True], device='cuda:0')

However, I am now able to use != on tensors on the gpu.

Are there wheels for python 3.11?

1 Like

Hi,

For PyTorch, we only provide the 3.10 package for the JetPack 6.2 environment currently.

Thank.s

3.10 is almost four years old, there is no reason not to upload 3.11 at the very least. Furthermore, in a different answer, you stated that “For recent PyTorch (2.x), it can be built without a custom patch. Orin GPU architecture is already added in the building config so you don’t do that manually.”

If that’s the case, why are the default aarch64 pytorch wheels not working?

Hi,

You don’t need the custom “patch”.
But you will still need to build it from the source if a custom Python version is required.

We provide the package built with our default JetPack environment.
That’s why the package is built for Python 3.10 as the latest JetPack 6.2 is Ubuntu 22.04 with Python 3.10.

Thanks.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.