Pegasus Xavier not detecting gpus consistently

Please provide the following info (check/uncheck the boxes after creating this topic):
Software Version
DRIVE OS Linux 5.2.6
DRIVE OS Linux 5.2.6 and DriveWorks 4.0
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
[ X] other DRIVE OS version
other

Target Operating System
Linux
QNX
other

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
1.9.1.10844
other

Host Machine Version
native Ubuntu 18.04
other

We are facing an issue with the Pegasus Hyperion 7.1 system. Sometimes Xavier A does not detect the GPU properly and roadrunner has poor performance resulting on radar queues being full and low FPS (from 24 → 8).

Trying to understand what could be happening and what can be done to check the system itself for hardware anomalies.

From the software side, when the problem happens the corresponding PCI address to the GPU is not present.

Open to suggestions for further troubleshooting.

Dmesg/Journal show the following during a good/bad boot sequence

Good:
[26-09-2023 16:47:33] Platform: number of GPU devices detected 2
[26-09-2023 16:47:33] Platform: currently selected GPU device discrete ID 0

Bad:
[26-09-2023 15:05:10] Platform: number of GPU devices detected 1
[26-09-2023 15:05:10] Platform: currently selected GPU device integrated ID 0

Dear @jpvans,
Does reboot fix the issue?
What is the frequency of this issue?
Could you check if flashing fix the issue?

@SivaRamaKrishnaNV

We are having 50% changes to get this issue fixed after a reboot which is taking precious time from our testing schedule. This issue happens very often and for example we had been rebooting the system for an hour without having success detecting the gpu.

We flashed the system as well (a couple of weeks ago) but we had been experiencing the issue before/after.

Can you provide further details on the GPU/Power sequence?

Dear @jpvans,
May I know the DRIVE release version? Is it not DRIVE SW 10.0 as I see you are mentioning roadrunner. Is the board used in car? To quickly check if it is SW issue, please check with latest DRIVE OS 5.6.0 + DW 4.0 release. If the issue persists, it could be HW issue.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.