Hi,
I have RHEL9.5 running a Dell R7525 server that has PCIe A100 GPU.
After enabling mig on the gpu the system won’t boot anymore
- Nvidia Driver Version: 570.86.15 CUDA Version: 12.8
- BIOS Firmware (2.18.1) latest
How can I resolve this?
Updates:
- cant get in rescue mode
- as was suggested in a diff thread, tried booting with both kernel params below and still oops in [nvidia_modeset]
-
- pci=realloc=off realloc=off
-
- pci=realloc=on realloc=on
- managed to turn mig off and at least can boot into linux
(added devie to pci-stub, logged into linux, unbined device, modprobe nvidia, which didn’t cause any error(!?), disabled mig, reboot and all back working)
still would like to resolve the issue and enable mig, any help would be appreciated.