Hello, our team has been trying to get two RTX 2080 TIs running on a single rack mounted Dell 7920s to no avail.
In the end, we receive a fail message:
ERROR: Unable to load the ‘nvidia-drm’ kernel module.
I’ve been through the install process several times just to see if I’ve missed a step. I’ll try and include everything I’ve done.
Along with the CUDA guide, I’ve followed the steps outlined here:
I am running RHEL Workstation 7.7, with a x86_64 architecture.
The Linux kernel is 3.10.0-1062.18.1.el7.x86_64
I believe I have blacklisted nouveau and set up the persistence daemon.
I am attempting to install via the .run file: NVIDIA-Linux-x86_64-440.64.run
I added the command: pci=realloc to the kernel boot parameters through grub as suggested in another post.
I also found this message on a Dell forum. I don’t believe we’ve had a chance to update the BIOS yet. The current machine is not having a problem moving past POST (though we do have another one that does get stuck on boot).
At this point I’m really stuck, I have no idea what to try next. Any help would be greatly appreciated. Thanks!
Ok, great. Thanks for the quick response. I’ll have to check with the sysadmin team and see about that process. Will report back once we have an update.
Hi Generix! Our tech team is a bit worried about removing Secure Boot. The option is definitely on the table, but I was wondering if there was a workaround we could pursue that allows us to leave Secure Boot enabled. Thanks again.
Addendum: once you created and enrolled the certs, you can use them directly with the .run installer. Run it with the -A option to get the options for module signing displayed.
Hi, quick update. The admin team was able to reconfigure without Secure Boot. I was able to install the base driver, along with CUDA and cuDNN and get some sample scripts working placing data on both GPUs. Thanks for your help!