Nsight-compute failing remote deployment of files (arm64 mac host, x86-64 ubuntu target)

Hi,
I’ve previously used the mac host (client? the UI in any case) for ncu to profile my programs on remote machines, over ssh.
Since switching over from a machine hosted on GCP GCE to one on AWS EC2, both with identical gpus and otherwise almost identical software installed, i’m encountering persistently that the “deploying” part of remote ssh profiling fails for seemingly no reason, only on the AWS machine. I wrote a script to parse & transfer all the files from the csv debug output (“more details” → " Click here to copy local and remote paths of the remaining files as CSV to the clipboard"), but even after that succeeds, the remote deployment process always errors out on deploying “libcuda-injection.so”.

What could be the issue here, how can i debug this further?

Hi, @hugo32
The error info indicates you have write permission issue on the target machine.
Please delete /home/hugo_dev/nvidia_tooling/nsight-compute on the target machine and try again.

Hi, yes i’ve tried both deleting and making sure (chown -R hugo_dev:hugo_dev /home/hugo_dev/nvidia_tooling/ etc) that i have permissions. In particular, all the other file transfers work, so i do not understand what kind of permissions error it could be

We never see such issue internally. I’m not sure what happens. Do you have another ubuntu target to have a try ?

I am also very confused 😅. Indeed all works perfectly on my other, basically identical, ubuntu machine. From what i can see the only difference is that one machine is on EC2 and one is on GCP. It also always fails on the libcuda-injection.so file. Are there any debug logging things I can do to get further information about what’s happening? I will otherwise probably try recording with wireshark.

It seems that the ssh profile in ncu gui isn’t fully updated/overwritten, in particular the “prefix to deploy ncu files to” is not updated, and required reinstalling ncu + restarting to be respected. Also if a deployment of a file fails, it seems that redeploying doesn’t work. In any case now I found a case of profiling reporting more than 100% SoL analysis so i’ve made progress :)

Hi, @hugo32

Glad to know you are unblocked now and thanks for the solution !