nvvp is stuck in "obtaining list of devices"

Hi, when I was trying to use nvvp to track performance of a CUDA application on a remote Linux system, the progress is always stuck in “Obtaining list of devices”. How can I debug it in this case?

Hi,

Can you please provide more information like,

OS/Platform:
GPU:
Driver version:
CUDA toolkit version:

Make sure that you have same CUDA toolkit version installed on your host (local) and target (remote) machines.
You can check toolkit version by running below command both on host and target machine.

$ /usr/local/cuda-10.2/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Tue_Dec__3_21:12:07_PST_2019
Cuda compilation tools, release 10.2, V10.2.108

Host system
OS: CentOS 8
GPU: N/A
Driver version: N/A
CUDA toolkit version: Cuda compilation tools, release 10.2, V10.2.89

On target machine:
OS: Ubuntu 18.04
GPU: GTX2080*2
Driver version: Unknown
CUDA toolkit version: Cuda compilation tools, release 10.1, V10.1.168

I just want to use nvvp on host machine so it is actually a virtual machine on my Mac. Versions are not matching but is it required to have exact same versions? BTW, after a long time waiting, nvvp reports the following problem:
Failed to execute “/bin/sh -c “export LD_LIBRARY_PATH=1”/usalocaUcuda/1ib64": \S{LD_LIBRARY_PATH}:export NVPROF_TMPDIR=1”/tmr;"/usalocaUcuda/bin/ nvproff --query-cuda-info; export LD_LIBRARY_PATH=V/usalocaUcuda/1ib64":\S {LD_LIBRARY_PATH};export NVPROF_TMPDIR=V/tmr;WusalocaUcuda/bin/ nvproff --version""

Hi,

Yes you need to have same CUDA toolkit version on both host and target. Please install 10.2 on your target or vice versa and let us know if that works.


Thanks,
Ramesh

I am having the same problem.
I am using CUDA 10.1.105 on both sides.

Annoyingly, this has worked previously on the exact same two machines, using the same installed software. However we have swapped cards around since, and may also have reinstalled the driver on this occasion.

Changing the login shell from /bin/csh to /bin/bash has done the trick. Remote profiling is working again for me.