Cuda on AGX Orin - no device found?

Hello,

I am trying to figure out why cuda is not working on an AGX Orin Dev Kit. I’ve tried numerous things from the Samples dir from the cuda ??? Every single cuda-enabled software I try to run reports no cuda devices found. I wonder if something isn’t configured/installed correctly after my previous debacle[0]?

Example:
cuda-samples/Samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 100
→ no CUDA-capable device is detected
Result = FAIL

lsmod|grep NVRM
[Sun Jun 19 14:50:46 2022] NVRM: loading NVIDIA UNIX Kernel Module for aarch64 34.1.1 Release Build (buildbrain@mobile-u64-5414-d7000) Mon May 16 21:12:24 PDT 2022
[Sun Jun 19 14:51:00 2022] NVRM gpumgrGetSomeGpu: Failed to retrieve pGpu - Too early call!.
[Sun Jun 19 14:51:00 2022] NVRM nvAssertFailedNoLog: Assertion failed: NV_FALSE @ gpu_mgr.c:295
[Sun Jun 19 14:51:00 2022] nvRmApiAlloc+0x30/0x40 [nvidia_modeset]
[Sun Jun 19 14:51:00 2022] NVRM gpumgrGetSomeGpu: Failed to retrieve pGpu - Too early call!.
[Sun Jun 19 14:51:00 2022] NVRM nvAssertFailedNoLog: Assertion failed: NV_FALSE @ gpu_mgr.c:295
[Sun Jun 19 14:51:01 2022] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call result 0x56:
[Sun Jun 19 14:51:01 2022] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call result 0x56:
[Sun Jun 19 14:51:01 2022] NVRM rmapiAllocWithSecInfo: allocation failed; status: Given class-id not valid [NV_ERR_INVALID_CLASS] (0x00000022)
[Sun Jun 19 14:51:01 2022] NVRM rmapiAllocWithSecInfo: client:0xc1d00001 parent:0xcaf00001 object:0x0 class:0x402c
[Sun Jun 19 14:51:01 2022] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call result 0x56:
[Sun Jun 19 14:51:01 2022] NVRM nvAssertFailed: Assertion failed: 0 @ g_mem_mgr_nvoc.h:1190
[Sun Jun 19 14:51:01 2022] nvRmApiAlloc+0x30/0x40 [nvidia_modeset]
[Sun Jun 19 14:51:01 2022] NVRM nvAssertFailed: Assertion failed: 0 @ g_mem_mgr_nvoc.h:1190
[Sun Jun 19 14:51:01 2022] NVRM nvAssertFailed: Assertion failed: 0 @ g_mem_mgr_nvoc.h:1190
[Sun Jun 19 14:51:01 2022] NVRM nvAssertFailed: Assertion failed: 0 @ g_mem_mgr_nvoc.h:1190
[Sun Jun 19 14:51:01 2022] NVRM nvAssertFailed: Assertion failed: 0 @ g_mem_mgr_nvoc.h:1190
[Sun Jun 19 14:51:01 2022] NVRM nvAssertFailed: Assertion failed: 0 @ g_mem_mgr_nvoc.h:1190
[Sun Jun 19 14:51:01 2022] NVRM nvAssertFailed: Assertion failed: 0 @ g_mem_mgr_nvoc.h:1190
[Sun Jun 19 14:51:01 2022] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call result 0xffff:
[Sun Jun 19 14:51:01 2022] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call result 0x56:
[Sun Jun 19 14:51:01 2022] NVRM nvAssertFailed: Assertion failed: 0 @ g_mem_mgr_nvoc.h:1190
[Sun Jun 19 14:51:01 2022] NVRM nvAssertFailed: Assertion failed: 0 @ g_mem_mgr_nvoc.h:1175
[Sun Jun 19 14:51:01 2022] NVRM nvAssertFailed: Assertion failed: 0 @ g_mem_mgr_nvoc.h:1190
[Sun Jun 19 14:51:01 2022] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call result 0xffff:
[Sun Jun 19 14:51:02 2022] NVRM nvAssertFailed: Assertion failed: 0 @ g_mem_mgr_nvoc.h:1175
[Sun Jun 19 14:51:05 2022] NVRM nvAssertFailed: Assertion failed: 0 @ g_mem_mgr_nvoc.h:1175
[Sun Jun 19 14:51:05 2022] NVRM nvAssertFailed: Assertion failed: 0 @ g_mem_mgr_nvoc.h:1175
[Sun Jun 19 14:52:14 2022] NVRM: failed to register with the ACPI subsystem!
[Sun Jun 19 14:52:14 2022] NVRM: failed to unregister from the ACPI subsystem!

which nvcc
/usr/local/cuda/bin/nvcc

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_11_23:44:05_PST_2021
Cuda compilation tools, release 11.4, V11.4.166
Build cuda_11.4.r11.4/compiler.30645359_0

I have a working GUI and everything else (as far as I can tell) is okay. Does anyone have a suggestion? Thank you!

[0] Did I kill it? - #3 by canutethegreat

Hello,

So I tried to install a newer version of CUDA from the repo “deb Index of /compute/cuda/repos/ubuntu2004/sbsa /”, but it fails with:
Building initial module for 5.10.65-tegra
apt remove cuda-dkms-515
ERROR (dkms apport): kernel package linux-headers-5.10.65-tegra is not supported
Error! Bad return status for module build on kernel: 5.10.65-tegra (arm64)
Consult /var/lib/dkms/nvidia/515.48.07/build/make.log for more information.

That makes me think the version that is installed (via “https://repo.download.nvidia.com/jetson/common r34.1/main”) as a jetpack dependency must be a custom build. Going back to the jetpack version every sample I try results in the same “no CUDA-capable device is detected” error. Maybe I’m just doing something wrong?!

Edit to add:
jetson_release

  • NVIDIA Jetson UNKNOWN
    • Jetpack UNKNOWN [L4T 34.1.1]
    • NV Power Mode: MAXN - Type: 0
    • jetson_stats.service: active
  • Libraries:
    • CUDA: NOT_INSTALLED
    • cuDNN: 8.3.2.49
    • TensorRT: 8.4.0.11
    • Visionworks: NOT_INSTALLED
    • OpenCV: 4.5.4 compiled CUDA: NO
    • VPI: ii libnvvpi2 2.0.14 arm64 NVIDIA Vision Programming Interface library
    • Vulkan: NOT_INSTALLED

Since your jetson is not working correctly, one possible solution would be to reinstall the device using the SDK manager.

sbsa is not the correct repository to use with Jetson.

You may wish to ask jetson AGX Orin question on the jetson AGX Orin forum.

Okay so rather than go through another reflash I decided to try something else: I used /opt/nvidia/l4t-gputools/bin/nvgpuswitch.py to switch to “dGPU” and then back to “iGPU” which fixed whatever was wrong. Now all of the CUDA samples are working correctly.