Cannot connect to process and Stuck in "Searching for attachable processes ..."

user48403 · May 31, 2022, 7:11am

Hi,
I am trying to use the latest Nsight Compute GUI (2022.2.0) to connect to the remote Linux server on my MacOS. Surely I can directly connect to this a100 machine with ssh -p xxxx username@xxx.xxx.xx.xx. The problem is that I cannot connect to the process as shown below

I used the Interactive Profile, and I also took some attempts in the console of the a100 machine. The binary ./a.outworks well. If I simply run ncu ./a.out, everything is fine. But when I tried ncu --mode=launch ./a.out, it will stuck in ==PROF== Waiting for profiler to attach on ports 49152-49215. When stuck here, I checked the port 49152 with lsof and it shows like this

The lsof command in other time would return null except when I am stuck in waiting. I also tried other ports just as suggested but they also behaved the same. Note that I do not have the sudo privilege. I don’t know if this is a port conflict problem or connection failure or just because I do not have the Administration permission? Or did I just miss something?

Could anyone please give me some advice? Thanks a lot.

felix_dt · May 31, 2022, 7:36am

It seems you are doing the right steps, but there may be some problems with the selected (default) ports. Can you please confirm that you tried

ncu --mode=launch ./a.out

on the remote target machine, and it would stop in “Waiting for profiler to attach”? That would be the expected behavior, as the application is launched on the target system and then suspended in the first CUDA API call, waiting for the host, in this case the local UI, to attach.

Can you please try

Launching the application on the remote target system using ncu --mode launch app, followed by
Using the local host UI’s Interactive Profile activity in “Attach” mode/tab, with your remote system selected in the connection dialog? Does this show the remote process available to attach?

Also, does the remote file selection work, i.e. while having the remote system selected in the connection dialog, click the “…” button next to “Application Executable”?

user48403 · May 31, 2022, 8:23am

Thanks. I have tried the following steps:

Run ncu --mode=launch ./a.out, and it is waiting like
Now I try to use the “Attach” mode but nothing appears, like this (no matter how I refresh)

image876×374 19.1 KB
I checked the remote file selection, and it works fine.

image1146×710 91.8 KB
I ctrl+c the ncu on the remote system and in most cases nothing returns. But in a few tries, it exit like this (I don’t know why)

image1022×87 19.3 KB
The connection preferences is like this

image1360×586 16 KB

felix_dt · May 31, 2022, 9:18am

Can you try with some command line utility if a TCP connection can be established on any of these ports, e.g. with netcat or iperf3:

Target machine: nc -l 49152
Host machine: nc <ip> 49152

or

Target machine: iperf3 -s -p 49152
Host machine: iperf3 -p 49152

If these don’t work either, this may simply be a firewall problem between these machines. Depending on your local permissions and policies, you could either

ask the admin to open these ports for TCP
use an ssh config to tunnel these ports over ssh, since that port is open
collect the report using ncu -o remotely and copy it over to your local system to open it in the UI

user48403 · May 31, 2022, 12:34pm

Many thanks! It seems that it is a firewall problem indeed. I would really appreciate it if you could give me another hand and teach me more about the ssh tunnel.

Suppose the server IP is IP1 and the target machine takes port PortA. So definitely I can connect to the target machine with ssh -p PortA username@IP1. I tried the command ssh -L 50152:IP1:49152 username@IP1 -p PortA to build the tunnel. But it did not work. Where is the problem with this port forwarding? Thanks!

felix_dt · May 31, 2022, 1:21pm

As stated in the previous answer, the easiest way to work around this if you
don’t need interactive profiling would be to collect the report on the remote
machine with the command line profiler.

If you need interactive profiling, it may be possible to workaround your
firewall issue by using NVIDIA Nsight Compute’s support for the SSH
ProxyJump/ProxyCommand option.

When detecting that the remote connection uses a proxy command to create the
socket connected to the remote SSH server, it sets up a local SOCKS proxy from
the local machine to the target and transparently forwards all connections
through that tunnel.

To use this functionality, you will have to setup your local SSH configuration
to use a proxy command to connect to the target and set that proxy command to
jump through your local machine before connecting to the target.

Assuming the hostname or IP address of the target machine is <target_host>, you
could add the following lines to ~/.ssh/config:

Host <target_host>
	ProxyJump localhost

In order for this to work, you will need to start an OpenSSH server on your
local machine and authenticate to the local machine through SSH keys.

To check everything is setup correctly, from the local machine with the modified
SSH configuration, you can try:

$ ssh <target_host>

If everything is setup correctly, you should not be prompted for authentication
to the local machine and should successfully connect to the target host.

On some macOS machines, the installed version of the OpenSSH client does not
support the ProxyJump option but does support the ProxyCommand one. If this is
the case, you can replace the ProxyJump localhost line in the configuration
snippet above by ProxyCommand ssh -W localhost.

Once this is setup, you should be able to do interactive profiling without
changing the connection target in the connection settings dialog.
When successful, you should see Started SSH SOCKS proxy on port: <port> in the log messages.

You may also refer to the Remote Connections documentation for further information.

user48403 · June 1, 2022, 3:56am

Thank you very much! I didn’t expect that your reply could be so in detail! Thanks for your nice reply.

I have tried what you suggested:

Modify the SSH configuration like this:

Host a100
    ProxyJump localhost
    HostName xxx.xxx.xx.xx
    Port xxxx
    User xxxx

Now if I ssh a100, I will be asked to enter the passwords of my local machine.
Add SSH keys on both my local machine and the target host, so now I can directly ssh a100 without any authentication prompts.
However, nothing happens and I am still stuck in the looping. The log messages are exactly the same as before. Sad.

felix_dt · June 1, 2022, 9:53am

Have you adjusted the connection settings within Nsight Compute to now also connect to “a100”, rather than the original IP address? Otherwise, Nsight Compute would not take advantage of your ProxyJump configuration.

user48403 · June 1, 2022, 11:04am

Yes. Sure. I always connect to the “a100”… The difference is that now I don’t need to enter the password in the Connection Dialog.

cuic3 · June 2, 2022, 12:06am

I also had this problem when I connect to the remote server.

But when I do ssh via X-window
local:~$ssh -X linux.server
linux:~$ncu-ui

I can launch it normally and the process number attached is 484131

Could it be that the limited process range in the Nsight Compute settings causes the problems?

Topic		Replies	Views
Nsight Compute remote connection problem Nsight Compute	2	1683	June 27, 2019
Stuck in "searching for attachable processses" loop Nsight Compute	10	4991	November 30, 2021
[SOLVED] Nsight compute unable to connect 2070 super Nsight Compute	3	2949	September 3, 2019
Nsight Compute doesn't connect to remote process in interactive mode Nsight Compute	3	907	October 12, 2021
Cannot remote profile (Attempting to connect to ncu-ui at ...) Nsight Compute	5	2563	January 10, 2023
Any pre-requisits for running NSIGHT Compute? Nsight Compute	4	1210	June 28, 2019
Nsight Compute 2021.1.0 hangs when launching process (MacOS Big Sur on M1) Nsight Compute	3	1063	May 4, 2021
Attach via SSH from MacOS host Nsight Compute	4	823	July 11, 2023
Nsight Compute GUI issues Nsight Compute cuda , profiling	0	970	May 27, 2021
Unable to connect to remote target via SSH Nsight Compute	1	980	April 15, 2021

Cannot connect to process and Stuck in "Searching for attachable processes ..."

Related topics