I am trying to get my K40c running on a redhat enterprise linux system.
However, I’m having trouble getting nvidia-smi to recognize the GPU; I get the “No devices were found” error when I typ “nvidia-smi -a”
I installed the CUDA 7.0 toolkit, then upgraded the driver to 346.59, and then rebooted the system.
Here are some info:
ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195, 0 Jun 22 14:22 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Jun 22 14:22 /dev/nvidiactl
I installed the toolkit and the driver by downloading and running cuda_7.0.28_linux.run and NVIDIA-Linux-x86_64-346.59.run (in that order).
sudo nvidia-smi -a gives me the same error message, “No devices were found.”
dmesg | grep NVRM
NVRM: loading NVIDIA UNIX x86_64 Kernel Module 346.59 Tue Mar 31 14:10:31 PDT 2015
NVRM: failed to copy vbios to system memory.
NVRM: RmInitAdapter failed! (0x30:0xffff:800)
NVRM: rm_init_adapter failed for device bearing minor number 0
NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5
NVRM: failed to copy vbios to system memory.
NVRM: RmInitAdapter failed! (0x30:0xffff:800)
NVRM: rm_init_adapter failed for device bearing minor number 0
NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5
NVRM: failed to copy vbios to system memory.
NVRM: RmInitAdapter failed! (0x30:0xffff:800)
NVRM: rm_init_adapter failed for device bearing minor number 0
NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5
NVRM: failed to copy vbios to system memory.
NVRM: RmInitAdapter failed! (0x30:0xffff:800)
NVRM: rm_init_adapter failed for device bearing minor number 0
NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5
NVRM: failed to copy vbios to system memory.
NVRM: RmInitAdapter failed! (0x30:0xffff:800)
NVRM: rm_init_adapter failed for device bearing minor number 0
NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5
I’m fairly certain noveau’s not an issue, as I uninstalled it using yum (remove xorg-x11-drv-nouveau.x86_64). Also, if it was loaded, it would probably show up in lsmod.
(Also, I did dracut --force).
I’ll try what’s on the link, but I don’t see how the BIOS could be an issue, as it was working fine just a few weeks ago (I haven’t touched it since the last time I used it).
Only thing I could think of is, Redhat installed something automatically (maybe a system update?) that’s causing this issue.
I’ll update this post after I’ve tried your suggestion.
You stated previously that you had removed nouveau in its entirety, yet the output from lspci above shows a nouveau kernel module. That appears to be a contradiction?
I was under the impression that I had removed the noveau driver. This is the output for yum
sudo yum remove xorg-x11-drv-nouveau.x86_64
Loaded plugins: refresh-packagekit, rhnplugin, security
This system is receiving updates from RHN Classic or RHN Satellite.
Setting up Remove Process
No Match for argument: xorg-x11-drv-nouveau.x86_64
Package(s) xorg-x11-drv-nouveau.x86_64 available, but not installed.
No Packages marked for removal
So the line “Kernel modules: nvidia, nouveau, nvidiafb” means that the nouveau driver is currently loaded?
txbob’s suggestion was to remove the nouveau driver. Your comment seems to imply that the kernel module is identical to the driver. Is it? Should I also remove the kernel module as well?
The information that this was all working a few weeks ago is new information that I was not aware of at the beginning of the thread.
Redhat can install kernel updates that will break the driver installed via the runfile installation method.
This is usually rectified by re-running the driver installer. Now I’m not sure if you recently installed the driver, or simply did that a few weeks ago and are now discovering that it’s not working.
Also, is this message something that might be causing this problem? (from dmesg)
NVRM: RmInitAdapter failed! (0x30:0xffff:800)
NVRM: rm_init_adapter failed for device bearing minor number 0
NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5
NVRM: failed to copy vbios to system memory.
I’m pretty much out of ideas. If, by chance, the previous driver install method was not by runfile but instead by the repo method, then that could be an issue.
Somebody having installed CUDA through yum is a possibility, as I’m not the only one with sudo access to the system. (I always use the run file method).
The output for sudo yum list nvidia-* is
Loaded plugins: refresh-packagekit, rhnplugin, security
This system is receiving updates from RHN Classic or RHN Satellite.
Available Packages
nvidia-kmod.x86_64 1:346.46-2.el6 cuda
nvidia-modprobe.x86_64 319.37-1.el6 cuda
nvidia-settings.x86_64 319.37-30.el6 cuda
nvidia-uvm-kmod.x86_64 1:346.46-3.el6 cuda
nvidia-xconfig.x86_64 319.37-27.el6 cuda
Yes, it’s a problem. It did not escape my attention earlier, but for me it merely confirms what we already know: the driver is not running correctly.
The yum list nvidia-* output doesn’t indicate any nvidia modules installed, so it does not appear to me that there is any issue with a previous yum/repo installation.
I would ordinarily assume that if you did a driver install via runfile, that the driver install completed successfully. There would usually be a message to that effect. If it did not complete successfully, there may be useful information in the driver installer log file. That would usually be deposited in:
/var/log/nvidia-installer.log
that file is difficult to parse, but if, for example there were messages in there about “unable to locate kernel headers”, then that would be indicative of a problem that could have been triggered by a redhat update (although it should also have given a clear error message when you ran the driver installer.)
For completeness, I don’t believe the reference to nouveau like this:
Kernel driver in use: nvidia
Kernel modules: nvidia, nouveau, nvidiafb
Hi, I realize this thread is three years old now, but I have the exact same problem. For what it is worth, my system was running just fine, when it suddenly crashed and after that has been giving me the saeme problems (RmInitAdapter failure) and GPU not detected by nvidia-smi.
Did anyone found the solution to it?
according to me its not a kernel issue or a driver issue.
I use Linux and windows in dual boot. I got struck by BSOD with a GPU related issue I don’t remember the proper issue and since then my windows didn’t recognize my GPU. But i can still see my GPU in device manager. I tried everything from installing new drivers to the factory drivers then i also made a fresh install of windows. Still no luck then i jumped to linux and still i cant shift to my nvidia GPU. I tried LTS kernel to ZEN kernel still no luck and ofcs latest kernel too. Still no luck i have the same results as the lemonherb had from the commands he mentioned.