How to profile several processes by NCU at the same time?

Assume I need to profile the interference between 3 models, what should I do?

Should I use three NCU commands and launch at the same time, like

ncu --set full -f out1 python &
ncu --set full -f out2 python &
ncu --set full -f out3 python 

Or use a bash script with --target-processes all, like

# mybashscript
python &
python &

then ncu --set full -o out --target-processes all bash myscript.bash


Are the three models running on a single GPU or across multiple GPUs?

I think for such a use case Nsight Systems will be useful. You could use GPU metrics supported in Nsight Systems.

Hi Sanjiv,
Sorry I didn’t make it clear.
I aim to run these processes on a single device, A100, and I aim to profile the interference on SMs and Memory bandwidth. (If L2 hit rate available, that will be greater.)

First question: Can Nsys help me profile memory usage?
Second question: If I have to use NCU, can NCU profile these processes at the same time?


Yes. The nsys –cuda-memory-usage option can be used to track GPU memory usage by CUDA kernels.

NCU can profile CUDA kernels launched by a application and all its child processes when the –target-processes all option is used. For this you can use a bash script as you mentioned earlier.