GPU Device Count Error on Orin When Running Sample Hello World

Software Version
DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)

Target Operating System

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)

SDK Manager Version
other -

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers

Note this is similar to this issue: An error occurs about GPU device count when I try to run the sample
But there was no resolution on this topic.

Hi, we were able to generate the sample cross compilation files following the “Samples Cross-Compilation From Source” section from this link DriveWorks SDK Reference: Getting Started Using the NVIDIA SDK Manager.
We moved the resulting bin files over to the Orin device and were able to run the sample_hello_world file.
However, we are now getting an error when running sample_hello_world:

nvidia-cam@tegra-ubuntu:/usr/local/driveworks/samples/bin$ ./sample_hello_world


Welcome to Driveworks SDK

[13-07-2023 22:44:11] Platform: Detected Drive Orin P3710

[13-07-2023 22:44:11] TimeSource: monotonic epoch time offset is 1689285677602097

[13-07-2023 22:44:11] TimeSourceVibranteLinux: detect valid PTP interface mgbe2_0

[13-07-2023 22:44:11] TimeSource: Could not detect valid PTP time source at nvpps. Fallback to mgbe2_0

[13-07-2023 22:44:11] PTP Time is available from Eth Driver

[13-07-2023 22:44:11] Adding variable DW_Base:DW_Version

[13-07-2023 22:44:11] Added variable DW_Base:DW_Version NvRmGpuLibOpen failed, error=14

[13-07-2023 22:44:11] Driveworks exception thrown: Platform: cannot retrieve GPU device count.. Error cudaErrorUnknown: unknown error

Cannot init SDK

Rebooting did not help resolve this issue, our only way to resolve this was to re-flash the Orin device. This worked after reflashing, but the error came back. Do you know what this issue could be? We are also trying to get the sample_camera running as well and are running into errors as well. Thank you.

Can you provide more information about the steps you have taken between the time when sample_hello_world was working and when it stopped working? This will help us better understand the issue and assist you with troubleshooting.

Dear @npeura,
Just checking in case you run sudo reboot now on target anytime after flash?
Could you share /sys/devices/gpu.0/railgate_enable and dmseg log output also.

I re-flashed and it failed (the log made it look like a permission issue, I can send the log if you’d like), but the contents were not wiped and now the sample_hello_world is working.
Could we keep this issue open for a few days and I can update if this issue pops up again?

Is this still an issue to support? Any result can be shared? Thanks

Hi, I have not seen this issue come up again after attempting to re-flash the machine. Thanks!

