[solved] Titan X Pascal with Cuda 8.0 on Ubuntu 16.04

Hello,

I’d try to install the configuration given in subject on a new machine But it didn’t work.

I take the package isntallation mode

sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda

I ran the post install actions, setting PATH and LD_LIBRARY_PATH and copy samples to my home directory.

The verify driver step give me :

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  375.39  Tue Jan 31 20:47:00 PST 2017
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)

And when I run deviceQuery program, I got an error :

./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 10
-> invalid device ordinal

Result = FAIL

It is the same with nvidia-smi program :

nvidia-smi
Unable to determine the device handle for GPU 0000:00:08.0: Unknown Error

I see that in syslog :

Feb 16 10:56:27 wstest-gpu3 systemd[1]: Starting NVIDIA Persistence Daemon...
Feb 16 10:56:27 wstest-gpu3 systemd[1]: Started NVIDIA Persistence Daemon.
Feb 16 10:56:27 wstest-gpu3 nvidia-persistenced: Verbose syslog connection opened
Feb 16 10:56:27 wstest-gpu3 nvidia-persistenced: Now running with user ID 116 and group ID 126
Feb 16 10:56:27 wstest-gpu3 nvidia-persistenced: Started (6115)
Feb 16 10:56:27 wstest-gpu3 nvidia-persistenced: device 0000:00:08.0 - registered
Feb 16 10:56:27 wstest-gpu3 nvidia-persistenced: Local RPC service initialized
Feb 16 10:56:27 wstest-gpu3 systemd[1]: Stopping NVIDIA Persistence Daemon...
Feb 16 10:56:27 wstest-gpu3 nvidia-persistenced: Received signal 15
Feb 16 10:56:27 wstest-gpu3 nvidia-persistenced: Socket closed.
Feb 16 10:56:27 wstest-gpu3 nvidia-persistenced: PID file unlocked.
Feb 16 10:56:27 wstest-gpu3 nvidia-persistenced: PID file closed.
Feb 16 10:56:27 wstest-gpu3 nvidia-persistenced: The daemon no longer has permission to remove its runtime data directory /var/run/nvidia-persistenced
Feb 16 10:56:27 wstest-gpu3 nvidia-persistenced: Shutdown (6115)
Feb 16 10:56:27 wstest-gpu3 systemd[1]: Stopped NVIDIA Persistence Daemon.
Feb 16 10:56:27 wstest-gpu3 systemd[1]: Starting NVIDIA Persistence Daemon...
Feb 16 10:56:27 wstest-gpu3 systemd[1]: Stopped NVIDIA Persistence Daemon.

Can you help me to resolve this problem ?

I do the same thing on another machine with a Tesla M40 without problem.

Thanks

after installing cuda, did you reboot at any point?

what is the output of:

dmesg |grep NVRM

on the failing system?

[    1.691281] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  375.39  Tue Jan 31 20:47:00 PST 2017 (using threaded interrupts)
[    2.404879] NVRM: RmInitAdapter failed! (0x23:0x56:458)
[    2.405044] NVRM: rm_init_adapter failed for device bearing minor number 0
[    2.412007] NVRM: RmInitAdapter failed! (0x23:0x56:458)
[    2.412233] NVRM: rm_init_adapter failed for device bearing minor number 0
[   53.492868] NVRM: RmInitAdapter failed! (0x23:0x56:458)
[   53.493045] NVRM: rm_init_adapter failed for device bearing minor number 0
[   53.497267] NVRM: RmInitAdapter failed! (0x23:0x56:458)
[   53.497446] NVRM: rm_init_adapter failed for device bearing minor number 0
[  382.126455] NVRM: RmInitAdapter failed! (0x23:0x56:458)
[  382.126701] NVRM: rm_init_adapter failed for device bearing minor number 0
[  382.133812] NVRM: RmInitAdapter failed! (0x23:0x56:458)
[  382.134081] NVRM: rm_init_adapter failed for device bearing minor number 0
[66494.568646] NVRM: RmInitAdapter failed! (0x23:0x56:458)
[66494.568876] NVRM: rm_init_adapter failed for device bearing minor number 0
[66574.108217] NVRM: RmInitAdapter failed! (0x23:0x56:458)
[66574.108448] NVRM: rm_init_adapter failed for device bearing minor number 0
[66574.112471] NVRM: RmInitAdapter failed! (0x23:0x56:458)
[66574.112675] NVRM: rm_init_adapter failed for device bearing minor number 0

Hello,

I change one parameter to KVM declaration




and it seems to OK.

nvidia-smi
Fri Feb 17 08:06:18 2017
±----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39 Driver Version: 375.39 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN X (Pascal) Off | 0000:00:08.0 Off | N/A |
| 0% 27C P0 51W / 250W | 0MiB / 12189MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

Many thanks