Nsight Systems Missing CUDA Info in Multi-Process Profiling

I profiled a multi-process application running in a docker container. The command I used to profile such an application is “nsys profile -o /home/zfan/sandbox/virgo_algo_preq/docker/profile_results/smaq_2c_1h ./LeafStandAlone.x86-64 -noForcedPatches /home/zfan/sandbox/virgo_algo_preq/data/JobInfo_108”. I didn’t find any CUDA information such as CUDA HW in my profiling result (the same happened for a singularity container), and I have long been troubled by profiling multi-process applications with nsys. On the other hand, there is no problem in profiling single-process applications. Both screenshots of reports for the single-process and the multiple-process applications are attached. (Note the CUDA HW in the single-process report.) The report for the multi-process itself is also attached. Can someone point out how to profile a multi-process application to capture all CUDA behaviors? In particular, I’d appreciate it if someone from the Nsight team helps me figure out if multi-process is good to profile in Nsight Systems (any known bugs?) and what is the correct way to use it. It is very miserable to have no available profiler for my multi-process application (nvprof doesn’t work on new generations of GPUs).

multi-process:

single-process:

smaq_2c_1h.qdrep (363.2 KB)

Have the same problem. Tried to workaround this by running multiple nsys profiling sessions simultaneously and it causes the other processes to often crash.