Nsight Cannot Find Remote Application

Hi, I am looking to use Nsight to connect and interactively profile a remote application. The code runs on a cluster which I can access through a jump server.

I am able to run ncu python kernel.py on the CLI on the GPU machine and profile it just fine, but I further want to profile with the visual interactive profiler.

I have Nsight Compute installed on my local macbook and on my gpu machine, and have an ssh config to connect directly to the GPU machine using a jump server, configured as gpuserver as a hostname. Confirmed ssh gpuserver logs into the GPU machine as intended.

I set up a conenction in Nsight Compute on my local macbook with hostname gpuserver, and can confirm I am able to see the file system through the “Remote Launch” menu.

Then, on the GPU machine I start the code I want to profile using:
ncu --mode launch python kernel.py. Unfortunately though, I am still unable to see the process in the list of applications in the Nsight UI on my macbook, even after refreshing the process list many times. I have tried using lower ports as well (using the --port argument on GPU server and editing preferences on my Macbook Nsight UI), but run into the same issue.

Is there any step I am missing? Thank you very much.

EDIT: I am using Nsight Compute UI on my Macbook, not Nsight Systems.

Hi, @doktay1

You are using Nsight System UI and want to attach the process launched by Nsight Compute(NCU) ?
These are different tools. You need to use Nsight Compute UI instead.

Thanks for the reply! I am using the Nsight Compute UI on my Macbook along with Nsight Compute CLI on my GPU server, so I don’t think that’s the issue. Sorry for the confusion! I just mistyped before.

Is version mismatch an issue? Do I have to have the same version of Nsight Compute on my macbook and on the server?

Hi, @doktay1

How many machines involved in this scenario ? Is it 3 or 2 ?
1 Mac to launch NCU UI, 1 jump server, 1 target machine with GPU ?

Yes that’s correct.

Thanks. Then attach is not supported in this scenario.
You can use remote launch instead.

That’s interesting, thanks for the information. Curious, why is this the case? Is using a ProxyJump server to log into a machine a fundamentally different thing than logging in directly?

Either way, I can try remote launch too. Last time I tried this though, it was taking about 5-10 minutes to start up, so I stopped it. Is this expected? I think it was setting up files in the /tmp/var directory during this time.

Turns out it really was just taking a long time. I was able to get the remote launch working, although it errors out at the cudaGetDevice API call. I can dig into this a bit and come back if I have more questions.

Thanks for your help!