I have 8 sheets of 3090,ubuntu 18.04 server,Unable to determine the device handle for GPU 0000:02:00.0:UnKnow Error

I have 8 sheets of 3090,ubuntu 18.04 server,The device has this error。

Last time I mentioned a similar question, but I didn’t find the error you mentioned in the bug report, not sure if this solution is useful

I now have two servers with this error, please look at the bug report again,Thank you for your help
nvidia-bug-reportA.log.gz (1.7 MB)
nvidia-bug-reportB.log.gz (4.4 MB)

In report A, only 7 gpus are detected by the bios which all work fine. In case there are really 8 plugged in, either one is broken or incorrectly seated or the mainboard/slot has issues.
In report B, one gpu reports issues in dmesg
GPU 0000:3f:00.0: RmInitAdapter failed! (0x31:0x40:2476)
In case this occurs right after boot, it’s likely broken, please check if it works in another system.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.