MSI RTX3090 eGPU Ubuntu 22.04.4 Issues

I have been having a difficult time getting a RTX 3090 eGPU from a Razer Core X working on a Razer Blade 15 Advanced with a 3080 TI. I mainly want to use the 3090 to offload machine learning training. I have scoured these forums to find solutions and none of them have worked.

After adding pcie_aspm=off to the grub file, the eGPU cannot be recognized by “nvidia-smi.” Before it would be recognized if I plugged the eGPU in during boot sequencing. I also have “nvidia.NVreg_OpenRmEnableUnsupportedGpus=1” set.

In the xorg.conf file, I also added the eGPU as a device allocated to the specified BUSID with the option “AllowExternalGpus” = “True”

When I run “lspci” I do see that the RTX 3090 is attached to the PCI bus as a VGA compatible controller.

I am using “nvidia-driver-535-open” since it is an open kernel.

The bug report I am including below is from when the eGPU was recognized by the system and I was doing a training run for a CV ML model. When I look at the error code with it being XID 79, that points to a handle of different error possibilities: temperature, power, firmware, etc. Though as I was training the model, I was running “nvidia-smi” on a 1 second loop using the command “watch -n 1 nvidia-smi.” I also took a screen cast of this so I could see the error and also have record of the record power consumption and the temperature readings.

If there was an instantaneous power spike, I don’t believe that this method would have caught that power spike which would have removed it from the bus. The interesting thing is that even though it was no longer recognized by the nvidia drivers (?), it was still connected to the PCI bus and recognized by “lspci.”

With all that said, there are quite a few different issues floating on here, but I think the underlying issue is that there is not really a straight forward way to connect the eGPU to a system running Ubuntu. I have read that the eGPUs seem to work with better compatibility on a Windows platform though for the nature of my work, Linux is my OS of choice.

Can you help me find a solution? Thanks.

file:///home/jwolf/nvidia-bug-report.log.gz

nvidia-bug-report.log.gz (432.8 KB)

did you solve this problem? i have same problem as you do , with a rtx 4070 egpu and nvidia-dirver-555 driver on ubuntu 22.04. when running a ml on egpu, it suddenly disappear