I am trying to use the python tools to examine slow kernels executing on a Tesla T4, located on a remote system (CUDA 10.2 in both locations). I have deduced that I must be in interactive mode to use the tools after profiling with success in the non-interactive mode on the remote system.
However, when I start up the profiler in interactive mode, the CUDA program loads, but Nsight Compute only repeats: “Trying to connect to process on host: xxx.xxx.xxx.xxx.” The CUDA program is suspended, apparently waiting for Nsight Compute to attach to it.
I am running Centos 7 locally and remotely.
I read an earlier post from a user running CUDA 10.0 experiencing a similar problem, which discussed closed ports starting at 4500, but the only port setting in the tool that I can see is the port for ssh, which is 22 as usual. I also read that the port problem would be fixed in a later release, and I am running CUDA 10.2.
Hopefully someone can help me over this problem so that I can find the coding bottlenecks in my kernels.
Thanks!
Answering my own question, it turns out that the firewall was blocking the ports that the Compute tool uses to attach to the process once it is running. Made a few changes to the firewall and I can now use interactive mode.
Good to hear that you were able to solve the issue. For future reference to others, documentation on how to check and update connection ports can be found here for the UI Nsight Compute :: Nsight Compute Documentation and here for the CLI Nsight Compute CLI :: Nsight Compute Documentation