Pegasus Xavier not detecting gpus consistently

Software Version
DRIVE OS Linux 5.2.6
DRIVE OS Linux 5.2.6 and DriveWorks 4.0
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
[ X] other DRIVE OS version

Target Operating System

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)

SDK Manager Version

Host Machine Version
native Ubuntu 18.04

We are facing an issue with the Pegasus Hyperion 7.1 system. Sometimes Xavier A does not detect the GPU properly and roadrunner has poor performance resulting on radar queues being full and low FPS (from 24 → 8).

Trying to understand what could be happening and what can be done to check the system itself for hardware anomalies.

From the software side, when the problem happens the corresponding PCI address to the GPU is not present.

Open to suggestions for further troubleshooting.

Dmesg/Journal show the following during a good/bad boot sequence

[26-09-2023 16:47:33] Platform: number of GPU devices detected 2
[26-09-2023 16:47:33] Platform: currently selected GPU device discrete ID 0

[26-09-2023 15:05:10] Platform: number of GPU devices detected 1
[26-09-2023 15:05:10] Platform: currently selected GPU device integrated ID 0

Dear @jpvans,
Does reboot fix the issue?
What is the frequency of this issue?
Could you check if flashing fix the issue?


We are having 50% changes to get this issue fixed after a reboot which is taking precious time from our testing schedule. This issue happens very often and for example we had been rebooting the system for an hour without having success detecting the gpu.

We flashed the system as well (a couple of weeks ago) but we had been experiencing the issue before/after.

Can you provide further details on the GPU/Power sequence?

Dear @jpvans,
May I know the DRIVE release version? Is it not DRIVE SW 10.0 as I see you are mentioning roadrunner. Is the board used in car? To quickly check if it is SW issue, please check with latest DRIVE OS 5.6.0 + DW 4.0 release. If the issue persists, it could be HW issue.

