Problem with cuda 7 toolkit on centos 6.6

dougtucker1 · June 10, 2015, 8:13pm

We purchased a tesla K80 and installed in a server running centos 6.6.

lspci | grep -i nvidia
44:00.0 3D controller: NVIDIA Corporation GK110BGL [Tesla K80] (rev a1)
45:00.0 3D controller: NVIDIA Corporation GK110BGL [Tesla K80] (rev a1)

uname -m && cat /etc/*release
x86_64
CentOS release 6.6 (Final)

I installed the driver using the wizard from here:

http://www.nvidia.com/Download/Find.aspx?lang=en-us

This installed driver verson 346.59:

modinfo /lib/modules/2.6.32-504.16.2.el6.x86_64/kernel/drivers/video/nvidia.ko
filename: /lib/modules/2.6.32-504.16.2.el6.x86_64/kernel/drivers/video/nvidia.ko
alias: char-major-195-*
version: 346.59
supported: external
license: NVIDIA
alias: pci:v000010DEd00000E00svsdbc04sc80i00*
alias: pci:v000010DEd00000AA3svsdbc0Bsc40i00*
alias: pci:v000010DEdsvsdbc03sc02i00
alias: pci:v000010DEdsvsdbc03sc00i00
depends: i2c-core
vermagic: 2.6.32-504.16.2.el6.x86_64 SMP mod_unload modversions
parm: NVreg_Mobile:int
parm: NVreg_ResmanDebugLevel:int
parm: NVreg_RmLogonRC:int
parm: NVreg_ModifyDeviceFiles:int
parm: NVreg_DeviceFileUID:int
parm: NVreg_DeviceFileGID:int
parm: NVreg_DeviceFileMode:int
parm: NVreg_RemapLimit:int
parm: NVreg_UpdateMemoryTypes:int
parm: NVreg_InitializeSystemMemoryAllocations:int
parm: NVreg_UsePageAttributeTable:int
parm: NVreg_MapRegistersEarly:int
parm: NVreg_RegisterForACPIEvents:int
parm: NVreg_CheckPCIConfigSpace:int
parm: NVreg_EnablePCIeGen3:int
parm: NVreg_EnableMSI:int
parm: NVreg_MemoryPoolSize:int
parm: NVreg_RegistryDwords:charp
parm: NVreg_RmMsg:charp
parm: NVreg_AssignGpus:charp

I then installed the cuda toolkit using instructions from:

http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#axzz3ca8v4mMB

I chose the centos package release as recommended. This installed:

[root@genuse32 yum.repos.d]# rpm -qa | grep cuda
cuda-cusparse-7-0-7.0-28.x86_64
cuda-samples-7-0-7.0-28.x86_64
cuda-driver-dev-7-0-7.0-28.x86_64
cuda-npp-dev-7-0-7.0-28.x86_64
cuda-cufft-dev-7-0-7.0-28.x86_64
cuda-documentation-7-0-7.0-28.x86_64
cuda-7.0-28.x86_64
cuda-misc-headers-7-0-7.0-28.x86_64
cuda-curand-7-0-7.0-28.x86_64
cuda-cudart-7-0-7.0-28.x86_64
cuda-toolkit-7-0-7.0-28.x86_64
cuda-repo-rhel6-7-0-local-7.0-28.x86_64
cuda-cusolver-dev-7-0-7.0-28.x86_64
cuda-cublas-dev-7-0-7.0-28.x86_64
cuda-runtime-7-0-7.0-28.x86_64
cuda-license-7-0-7.0-28.x86_64
cuda-npp-7-0-7.0-28.x86_64
cuda-cufft-7-0-7.0-28.x86_64
cuda-visual-tools-7-0-7.0-28.x86_64
cuda-7-0-7.0-28.x86_64
cuda-cusparse-dev-7-0-7.0-28.x86_64
cuda-nvrtc-dev-7-0-7.0-28.x86_64
cuda-command-line-tools-7-0-7.0-28.x86_64
cuda-cusolver-7-0-7.0-28.x86_64
cuda-cublas-7-0-7.0-28.x86_64
cuda-drivers-346.46-0.x86_64
cuda-core-7-0-7.0-28.x86_64
cuda-curand-dev-7-0-7.0-28.x86_64
cuda-cudart-dev-7-0-7.0-28.x86_64
cuda-nvrtc-7-0-7.0-28.x86_64

I compiled all the sample scripts successfully as well. However, trying to run deviceQuery resulted in:

/usr/local/cuda/samples/NVIDIA_CUDA-7.0_Samples/bin/x86_64/linux/release/deviceQuery
/usr/local/cuda/samples/NVIDIA_CUDA-7.0_Samples/bin/x86_64/linux/release/deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
→ no CUDA-capable device is detected
Result = FAIL

Running nvidia-smi results in:

nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system

Searching google all I have been able to find with any real value was a ubuntu thread from 2013 that indicated the kernel module installed is too new for the toolkit installed. It did not have a solution. Any help would be greatly appreciated!

Robert_Crovella · June 10, 2015, 8:55pm

It was probably unwise to install the driver using a runfile installer and then switch to the package manager method for other components. It’s possible that as you pulled in those other components, they pulled in driver components:

cuda-drivers-346.46-0.x86_64

that are incompatible with the 346.59 driver you installed.

If you’re going to use a runfile installer, I’d suggest starting over, and just using the cuda toolkit runfile installer. It will install a suitable driver along with the cuda toolkit.

You may need to reload the OS first, or do a good job of purging old nvidia components.

recap: either use only package manager method, or use only runfile installer methods.

mixing the two can be troublesome. This is referenced in the doc you indicated:

[url]http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#handle-uninstallation[/url]

dougtucker1 · June 11, 2015, 4:05pm

Thanks txbob. I didn’t do the runtime installer, I installed the gpu driver from the site. I thought I had to do this so it was addressable from the OS, I did not realize the cuda toolkit installed a driver of it’s own. This was my mistake. The bizarre thing as I worked through it yesterday though was deleting the nvidia.ko from the kernel /lib/* and rebooting didn’t do the trick. It was still unaddressable. I accidentally resolved it by running and upgrade on the machine and got lucky that there just happened to be a new kernel available. After rebooting into the new kernel everything just magically worked and it only had the .46 driver that I needed. Thanks!

Robert_Crovella · June 11, 2015, 4:40pm

The GPU driver also has a runfile installer. When you access the driver download site that you linked, the only thing available there are runfile installers (for linux). So if you installed the driver from that site, you used a runfile installer “method”

The package manager method is accomplished without using that driver download site, and instead uses package commands such as yum or apt-get appropriate for whatever linux distro you have.

And apart from all that, the CUDA toolkit comes in both runfile installer formats and package manager methods/formats.

If you use a CUDA toolkit runfile installer, buried inside that CUDA toolkit runfile installer is a runfile installer for the driver (that happens to be bundled with the CUDA toolkit).

So I believe a clash occurred between your runfile installation method of the GPU driver with the package manager method you used to install the toolkit (which also brought driver components with it.)

Topic		Replies	Views
Problem with cuda 7 toolkit on centos 6.6 Linux	2	1318	June 11, 2015
Cuda 6.5 deviceQuery fails even though Nvidia driver for Tesla is properly installed CUDA Setup and Installation	4	3850	September 3, 2014
Problems installing CUDA Toolkit 7 for GTX 980 Ti CUDA Setup and Installation	2	4306	June 16, 2015
Problem with CUDA driver and runtime versions CUDA Setup and Installation	1	1420	January 12, 2016
[Solved] deviceQuery no response CUDA Setup and Installation	2	2356	April 16, 2015
Suse Enterprise 10.3 CUDA Programming and Performance	2	9223	September 7, 2010
New person CUDA Setup and Installation	11	1853	May 15, 2015
cuda tool kit installation failed CUDA Setup and Installation	1	1168	November 13, 2013
CUDA Toolkit Installation Issue with CentOS7. CUDA Setup and Installation	8	2593	December 5, 2019
Keeping Built In Drivers From Nvidia Toolkit Up To Date On Ubuntu CUDA Setup and Installation	2	985	April 10, 2016

Problem with cuda 7 toolkit on centos 6.6

Related topics