Intermittent device node creation for eGPU

Hello,

My platform is an ASUS laptop (Zephyrus M16 2023) with mobile RTX 4090 running Ubuntu 22.04 and Nvidia driver 550.54.14. I also have an RTX 4500 Ada Generation installed in a Razer Core X eGPU enclosure. My target application is AI/ML and not graphical.

With the 4500 connected on boot, I always see /dev/nvidia0 created for the 4090. Intermittently (~50% occurence), /dev/nvidia1 for the 4500 is not created. Consequently, nvidia-smi and nvtop report only the 4090. If nvidia-modprobe is installed, nvidia-smi or nvtop will automatically create /dev/nvidia1 and report both GPUs. Swapping the thunderbolt cable does not change the behavior.

In the missing-4500 scenario, nvidia-bug-report.sh also creates /dev/nvidia1 during its execution.

How can I achieve reliable automatic device node creation on boot? I have attached bug reports from 4500-present (-good) and missing (-bad) boots.

Thank you
nvidia-bug-report-bad.log.gz (780.2 KB)
nvidia-bug-report-good.log.gz (808.1 KB)