How to profile several processes by NCU at the same time?

Assume I need to profile the interference between 3 models, what should I do?

Should I use three NCU commands and launch at the same time, like

ncu --set full -f out1 python f1.py &
ncu --set full -f out2 python f2.py &
ncu --set full -f out3 python f3.py 

Or use a bash script with --target-processes all, like

# mybashscript
python f1.py &
python f2.py &
python f3.py 

then ncu --set full -o out --target-processes all bash myscript.bash

Best
Max

Are the three models running on a single GPU or across multiple GPUs?

I think for such a use case Nsight Systems will be useful. You could use GPU metrics supported in Nsight Systems.

Hi Sanjiv,
Sorry I didn’t make it clear.
I aim to run these processes on a single device, A100, and I aim to profile the interference on SMs and Memory bandwidth. (If L2 hit rate available, that will be greater.)

First question: Can Nsys help me profile memory usage?
Second question: If I have to use NCU, can NCU profile these processes at the same time?

Best
Tianyu

Yes. The nsys –cuda-memory-usage option can be used to track GPU memory usage by CUDA kernels.

NCU can profile CUDA kernels launched by a application and all its child processes when the –target-processes all option is used. For this you can use a bash script as you mentioned earlier.