Nsight Compute remote connection problem

Client OS: macOS Mojave 10.14.5
Remote cuda version: 10.0

I’m trying to connect to remote server and profile an pytorch application, but Nsight Compute hangs printing “Trying to connect to process on host: xxx.xxx.xxx.xxx…”

[Settings - Target Platform]
Application Executable: /home/xxxxx/.virtualenvs/waveglow/bin/python3.5
Working Directory: /home/xxxxx/xxxxx/waveglow
Command Line Arguments: /home/xxxxx/xxxxx/waveglow/inference.py -f <(ls /home/xxxxx/xxxxx/waveglow/mel_spectrograms/*.pt) -w /home/xxxxx/xxxxx/waveglow/waveglow_256channels.pt -o /home/xxxxx/xxxxx/waveglow --is_fp16 -s 0.6
Environment: DISPLAY:=0
Automatically Connect: Yes

[Settings - Activity]
Enable NVTX support: No
Disable Profiling Start/Stop: No
Enable Profiling from Start: Yes
Clock Control: Base

Checking file deployment: libInterceptorInjectionTarget.so
Checking file deployment: libTreeLauncherPlaceholder.so
Checking file deployment: libTreeLauncherTargetInjection.so
Checking file deployment: libTreeLauncherTargetUpdatePreloadInjection.so
Checking file deployment: TreeLauncherTargetLdPreloadHelper
Checking file deployment: libcuda-injection.so
Checking file deployment: libInterceptorInjectionTarget.so
Checking file deployment: libnvperf_host.so
Checking file deployment: libnvperf_target.so
Checking file deployment: libnvperfapi64.so
Checking file deployment: libNvSwCounterApi.so
Checking file deployment: libTreeLauncherPlaceholder.so
Checking file deployment: libTreeLauncherTargetInjection.so
Checking file deployment: libTreeLauncherTargetUpdatePreloadInjection.so
Checking file deployment: nv-nsight-cu-cli
Checking file deployment: TreeLauncherSubreaper
Checking file deployment: TreeLauncherTargetLdPreloadHelper
/home/xxxxx/.virtualenvs/waveglow/bin/python3.5 /home/xxxxx/xxxxx/waveglow/inference.py -f <(ls /home/xxxxx/xxxxx/waveglow/mel_spectrograms/*.pt) -w /home/xxxxx/xxxxx/waveglow/waveglow_256channels.pt -o /home/xxxxx/xxxxx/waveglow --is_fp16 -s 0.6
Launching: /home/xxxxx/.virtualenvs/waveglow/bin/python3.5 (host: xxx.xxx.xxx.xxx)
Process launched
Trying to connect to process...
Trying to connect to process on host: xxx.xxx.xxx.xxx...
Trying to connect to process on host: xxx.xxx.xxx.xxx...
Trying to connect to process on host: xxx.xxx.xxx.xxx...
Trying to connect to process on host: xxx.xxx.xxx.xxx...
Trying to connect to process on host: xxx.xxx.xxx.xxx...
Trying to connect to process on host: xxx.xxx.xxx.xxx...
...

Can you please let us know the exact version of Nsight Compute you are using, e.g. via running:

nv-nsight-cu-cli --version

Also, what OS and GPU is this on?

In general, you will need to make sure that the network port(s) used by Nsight Compute for communicating with the remote target are open and accessible. Since version 2019.3, users can choose the port range used by the tool for communication with remote targets (remote launch still uses the default SSH network port). You can set those on the command line, as well as in the UI options dialog under ‘Connection’.

There is a know bug that the Interactive Profile activity still uses the old default port range of 4500-4510. The bug will be fixed in the next release, and it does not apply to the non-interactive Profile activity, nor to remote attach.

You can check if port 4500 is open e.g. using the iperf utility, or by launching the app via the command line on the remote machine using

nv-nsight-cu-cli --port 4500 --mode launch <app>

setting this port in the UI options and using the Interactive Profile activity for attaching to the remote machine.

I succeeded to connect to remote by using non-Interactive Profile. But Nsight Compute said it cannot find section files; so I uploaded section files to server.

But I found only one kernel in the result.

This is log during profiling:

==PROF== Profiling “weight_norm_fwd_first_dim_ker…” - 11: 0%…50%…100%

  • 48 passes
    ==PROF== Profiling “weight_norm_fwd_first_dim_ker…” - 12: 0%…50%…100%
  • 48 passes
1 Like