Failed to start pfofiling on k8s pod with "Failed to probe the proces" message

I install nsys on k8s pod by deb package. When I try to run nsys to profile anything, it just say “Failed to probe the process (sync). Timeout: 75 sec” and exit immediately.

nsys version: NVIDIA Nsight Systems version 2024.4.1.61-244134315967v0

nsys status:
Timestamp counter supported: Yes

CPU Profiling Environment Check
Root privilege: enabled
Linux Kernel Paranoid Level = 2
Linux Distribution = Ubuntu
Linux Kernel Version = 4.19.113-300.el7.x86_64: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): OK

Greetings,

This might be an issue with CPU sampling, can you try running without it and see if it works?

Run:

nsys profile --sample=none python ...

Thanks for your reply. I tried this but get same error msg.

$ nsys profile --sample=none python -c "print(1)"
Failed to probe the process (sync). Timeout: 75 sec

I’m not able to reproduce the issue - lets start from scratch and verify that this works in your environment, and then we can try and close the gap and find out what is going on. Can you try:

  • Run a plain Ubuntu pod:
kubectl run test-nsys --rm -i --tty --image ubuntu:22.04 -- bash
  • Then, install nsys from deb:
apt update
apt install -y wget libglib2.0-0
wget https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2024_4/NsightSystems-linux-cli-public-2024.4.1.61-3431596.deb
dpkg -i NsightSystems-linux-cli-public-2024.4.1.61-3431596.deb

And finally, try a simple profile:

nsys profile sleep 5
1 Like

Thank you again firstly. I try your command, and everything goes fine. And than I tried some different pod situation, finally find that pod deployed by GPU Manager will meet this error.

I will try to find out how GPU Manager works.
Thank you again!

Thank you for the follow-up. I’m not familiar with GPU Manager but I will take a look to see if I can make a recommendation of how to get nsys to work in your environment.

@user151892 To follow-up: Unfortunately, I think that GPU Manager may not be compatible with nsys. It appears that GPU Manager works by intercepting CUDA API calls, which the profiler also intercepts the same API. They likely conflicting with one another.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.