nvprof incompatible CUDA drivers

Greetings,

We are having a bit of a problem with nvprof and our new K20 on CentOS 6.6 (fully updated). We installed the full cuda-6.5.14 using the run installer. The installer installed everything from the drivers to the tool sets. As far as I can tell, and from everything we have tested and done, most everything is working great. However, nvprof doesn’t want to run. When we try we get this error:
“Unable to profile application. Unknown nvprof return value 19”

We also get the message in the log file:
“======== Error: incompatible CUDA driver version.”

There are a few threads saying they resolved this issue by installing the RPMs. I tried using the repo and yum would only install the devel driver which made all the tools complain about incompatible driver (it also didn’t like me trying to install the RPMs to our software directory preferring to install into / which has other issues for us but that is another matter). So I wiped everything and did the run file installer again. Everything works once more except nvprof which still has the same error. So the RPMs made a mess for me but the run file does everything correctly but nvprof.

I found another thread saying that if there is an issue between the Driver and the Runtime then it won’t work. That thread suggested to run the deviceQuery. When I build and run it, it produces this:
CUDA Driver Version / Runtime Version 6.5 / 6.5

That tells me that it isn’t an issue with different versions.

Is anyone else able to suggest something to try?

Thank you!

Did you have previous CUDA versions installed?

what does

which nvprof

indicate?

how about:

nvidia-smi

and:

echo $PATH

The run file also said I needed to set the LD_LIBRARY_PATH so I included that as well.

$ which nvprof
/software/NVidia/cuda-6.5.14/bin/nvprof


$ nvidia-smi 
Thu Mar  5 09:14:10 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 340.29     Driver Version: 340.29         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20c          Off  | 0000:03:00.0     Off |                    0 |
| 30%   30C    P0    56W / 225W |     11MiB /  4799MiB |     94%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|  No running compute processes found                                         |
+-----------------------------------------------------------------------------+

$ echo $PATH
/software/NVidia/cuda-6.5.14/bin/:/software/act/sge/bin/linux-x64:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/software/stattransfer/12/:/home/kc/j1crsm02/bin

$ echo $LD_LIBRARY_PATH
/software/NVIDIA/6.5.14/lib64

Thanks!

could you give the exact command line you are using to launch nvprof?

also, which log file are you looking in?

have you previously used any other command-line profiler or do you have any of the following environment variables set:

[url]http://docs.nvidia.com/cuda/profiler-users-guide/index.html#command-line-profiler-control[/url]

launch via
$ nsight

Then I load a project and do Run->Profile As->local c/C++ project.

I get a pop up that says “Profiling Failed. Unable to profile application. Unknown nvprof return value:19”

In the console window I see (first three lines are from my project) :
"9401
computing utility matrices
XA: 376040

nvprof log: /home/stack/cuda-workspace/.metadata/.plugins/com.nvidia.viper/launch/0/nvprof.log"

That log file has:
“======== Error: incompatible CUDA driver version.”

I am not sure how to check those variables from inside of nsight.

Thanks for helping me with this!

OK super weird. We closed the code window and played around with the interface a bit before trying it again, now it works!

We reopened the code window and tried to rest the window as best we could, and it still works!

I don’t know what is going on. Possibly an nsight variable got reset while we were messing with it??

Thanks for the help!