Hi, I am looking to use Nsight to connect and interactively profile a remote application. The code runs on a cluster which I can access through a jump server.
I am able to run ncu python kernel.py on the CLI on the GPU machine and profile it just fine, but I further want to profile with the visual interactive profiler.
I have Nsight Compute installed on my local macbook and on my gpu machine, and have an ssh config to connect directly to the GPU machine using a jump server, configured as gpuserver as a hostname. Confirmed ssh gpuserver logs into the GPU machine as intended.
I set up a conenction in Nsight Compute on my local macbook with hostname gpuserver, and can confirm I am able to see the file system through the “Remote Launch” menu.
Then, on the GPU machine I start the code I want to profile using: ncu --mode launch python kernel.py. Unfortunately though, I am still unable to see the process in the list of applications in the Nsight UI on my macbook, even after refreshing the process list many times. I have tried using lower ports as well (using the --port argument on GPU server and editing preferences on my Macbook Nsight UI), but run into the same issue.
Is there any step I am missing? Thank you very much.
EDIT: I am using Nsight Compute UI on my Macbook, not Nsight Systems.
You are using Nsight System UI and want to attach the process launched by Nsight Compute(NCU) ?
These are different tools. You need to use Nsight Compute UI instead.
Thanks for the reply! I am using the Nsight Compute UI on my Macbook along with Nsight Compute CLI on my GPU server, so I don’t think that’s the issue. Sorry for the confusion! I just mistyped before.
Is version mismatch an issue? Do I have to have the same version of Nsight Compute on my macbook and on the server?
That’s interesting, thanks for the information. Curious, why is this the case? Is using a ProxyJump server to log into a machine a fundamentally different thing than logging in directly?
Either way, I can try remote launch too. Last time I tried this though, it was taking about 5-10 minutes to start up, so I stopped it. Is this expected? I think it was setting up files in the /tmp/var directory during this time.
Turns out it really was just taking a long time. I was able to get the remote launch working, although it errors out at the cudaGetDevice API call. I can dig into this a bit and come back if I have more questions.