Nsight Compute remote connection problem

mhkim4886 · June 26, 2019, 8:22am

Client OS: macOS Mojave 10.14.5
Remote cuda version: 10.0

I’m trying to connect to remote server and profile an pytorch application, but Nsight Compute hangs printing “Trying to connect to process on host: xxx.xxx.xxx.xxx…”

[Settings - Target Platform]
Application Executable: /home/xxxxx/.virtualenvs/waveglow/bin/python3.5
Working Directory: /home/xxxxx/xxxxx/waveglow
Command Line Arguments: /home/xxxxx/xxxxx/waveglow/inference.py -f <(ls /home/xxxxx/xxxxx/waveglow/mel_spectrograms/*.pt) -w /home/xxxxx/xxxxx/waveglow/waveglow_256channels.pt -o /home/xxxxx/xxxxx/waveglow --is_fp16 -s 0.6
Environment: DISPLAY:=0
Automatically Connect: Yes

[Settings - Activity]
Enable NVTX support: No
Disable Profiling Start/Stop: No
Enable Profiling from Start: Yes
Clock Control: Base

Checking file deployment: libInterceptorInjectionTarget.so
Checking file deployment: libTreeLauncherPlaceholder.so
Checking file deployment: libTreeLauncherTargetInjection.so
Checking file deployment: libTreeLauncherTargetUpdatePreloadInjection.so
Checking file deployment: TreeLauncherTargetLdPreloadHelper
Checking file deployment: libcuda-injection.so
Checking file deployment: libInterceptorInjectionTarget.so
Checking file deployment: libnvperf_host.so
Checking file deployment: libnvperf_target.so
Checking file deployment: libnvperfapi64.so
Checking file deployment: libNvSwCounterApi.so
Checking file deployment: libTreeLauncherPlaceholder.so
Checking file deployment: libTreeLauncherTargetInjection.so
Checking file deployment: libTreeLauncherTargetUpdatePreloadInjection.so
Checking file deployment: nv-nsight-cu-cli
Checking file deployment: TreeLauncherSubreaper
Checking file deployment: TreeLauncherTargetLdPreloadHelper
/home/xxxxx/.virtualenvs/waveglow/bin/python3.5 /home/xxxxx/xxxxx/waveglow/inference.py -f <(ls /home/xxxxx/xxxxx/waveglow/mel_spectrograms/*.pt) -w /home/xxxxx/xxxxx/waveglow/waveglow_256channels.pt -o /home/xxxxx/xxxxx/waveglow --is_fp16 -s 0.6
Launching: /home/xxxxx/.virtualenvs/waveglow/bin/python3.5 (host: xxx.xxx.xxx.xxx)
Process launched
Trying to connect to process...
Trying to connect to process on host: xxx.xxx.xxx.xxx...
Trying to connect to process on host: xxx.xxx.xxx.xxx...
Trying to connect to process on host: xxx.xxx.xxx.xxx...
Trying to connect to process on host: xxx.xxx.xxx.xxx...
Trying to connect to process on host: xxx.xxx.xxx.xxx...
Trying to connect to process on host: xxx.xxx.xxx.xxx...
...

felix_dt · June 26, 2019, 1:12pm

Can you please let us know the exact version of Nsight Compute you are using, e.g. via running:

nv-nsight-cu-cli --version

Also, what OS and GPU is this on?

In general, you will need to make sure that the network port(s) used by Nsight Compute for communicating with the remote target are open and accessible. Since version 2019.3, users can choose the port range used by the tool for communication with remote targets (remote launch still uses the default SSH network port). You can set those on the command line, as well as in the UI options dialog under ‘Connection’.

There is a know bug that the Interactive Profile activity still uses the old default port range of 4500-4510. The bug will be fixed in the next release, and it does not apply to the non-interactive Profile activity, nor to remote attach.

You can check if port 4500 is open e.g. using the iperf utility, or by launching the app via the command line on the remote machine using

nv-nsight-cu-cli --port 4500 --mode launch <app>

setting this port in the UI options and using the Interactive Profile activity for attaching to the remote machine.

mhkim4886 · June 27, 2019, 1:32am

I succeeded to connect to remote by using non-Interactive Profile. But Nsight Compute said it cannot find section files; so I uploaded section files to server.

But I found only one kernel in the result.

This is log during profiling:
…
==PROF== Profiling “weight_norm_fwd_first_dim_ker…” - 11: 0%…50%…100%

48 passes
==PROF== Profiling “weight_norm_fwd_first_dim_ker…” - 12: 0%…50%…100%
48 passes
…

Topic		Replies	Views
Compute CLI hangs when profiling PyTorch application Nsight Compute	8	1813	August 6, 2019
Nsight Cannot Find Remote Application Nsight Compute	8	414	April 24, 2024
When accessed remotely, nuc does not be working on non-interactive profile mode Nsight Compute	3	402	September 4, 2020
Cannot connect to process and Stuck in "Searching for attachable processes ..." Nsight Compute	9	1951	June 2, 2022
Nsight Compute doesn't connect to remote process in interactive mode Nsight Compute	3	876	October 12, 2021
Cannot remote profile (Attempting to connect to ncu-ui at ...) Nsight Compute	5	2445	January 10, 2023
NSight Compute CUPTI_ERROR_MULTIPLE_SUBSCRIBERS_NOT_SUPPORTED Nsight Compute cudnn	5	983	January 29, 2024
Nsight Compute 2021.1.0 hangs when launching process (MacOS Big Sur on M1) Nsight Compute	3	1034	May 4, 2021
Nsight-compute print "the application returned an error code (249)" Nsight Compute	5	1445	February 13, 2023
NCU stuck at "Connected to ncu-ui" Nsight Compute	2	124	February 28, 2025

Nsight Compute remote connection problem

Related topics