Nsight-Compute returns "No kernels were profiled" for multi-process profiling

Hi,
I have two executable files, and I plan to run them concurrently.
Hence, I package the command into a .sh file multi.sh:

./a.out &
./b.out
wait

I used nvprof to profile them successfully:
nvprof --profile-child-processes multi.sh

However, when I use Nsight compute:
sudo ./ncu --target-processes all multi.sh
The return message shows:
==WARNING== No kernels were profiled.

Could you help me identify the issue?

1 Like

This scenario is expected to work, see blow. Please make sure that your script starts with e.g. #!/bin/bash, or that you prepend its execution with /bin/bash, /bin/sh, …, like ncu --target-processes all /bin/bash multi.sh

$ cat run.sh
#!/bin/bash
./CuVectorAddDrv &
./CuVectorAddDrv
wait

$ sudo ~/Downloads/nsight_compute-linux-x86_64-2020.3.0.18_29307467/ncu --target-processes all run.sh
CUDA Vector Addition (Driver API)
CUDA Vector Addition (Driver API)
==PROF== Connected to process 18287 (CuVectorAddDrv)

Using Device 0: “GeForce RTX 2080 Ti” with Compute 7.5 capability
==PROF== Profiling “VecAdd_kernel” - 1: 0%…50%…100% - 8 passes
1 kernels took 1029227328 usec. Average kernel overhead is: 1029227300 usec
Result = PASS
==PROF== Disconnected from process 18287
==PROF== Connected to process 18288 (CuVectorAddDrv)
Using Device 0: “GeForce RTX 2080 Ti” with Compute 7.5 capability
==PROF== Profiling “VecAdd_kernel” - 2: 0%…50%…100% - 8 passes
1 kernels took 1647380864 usec. Average kernel overhead is: 1647380815 usec
Result = PASS
==PROF== Disconnected from process 18288
[18287] CuVectorAddDrv@127.0.0.1
…

Maybe you could check the solution here