I install nsys on k8s pod by deb package. When I try to run nsys to profile anything, it just say “Failed to probe the process (sync). Timeout: 75 sec” and exit immediately.
nsys version: NVIDIA Nsight Systems version 2024.4.1.61-244134315967v0
nsys status:
Timestamp counter supported: Yes
CPU Profiling Environment Check
Root privilege: enabled
Linux Kernel Paranoid Level = 2
Linux Distribution = Ubuntu
Linux Kernel Version = 4.19.113-300.el7.x86_64: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): OK
I’m not able to reproduce the issue - lets start from scratch and verify that this works in your environment, and then we can try and close the gap and find out what is going on. Can you try:
Run a plain Ubuntu pod:
kubectl run test-nsys --rm -i --tty --image ubuntu:22.04 -- bash
Thank you again firstly. I try your command, and everything goes fine. And than I tried some different pod situation, finally find that pod deployed by GPU Manager will meet this error.
I will try to find out how GPU Manager works.
Thank you again!
Thank you for the follow-up. I’m not familiar with GPU Manager but I will take a look to see if I can make a recommendation of how to get nsys to work in your environment.
@user151892 To follow-up: Unfortunately, I think that GPU Manager may not be compatible with nsys. It appears that GPU Manager works by intercepting CUDA API calls, which the profiler also intercepts the same API. They likely conflicting with one another.