Assume I need to profile the interference between 3 models, what should I do?
Should I use three NCU commands and launch at the same time, like
ncu --set full -f out1 python f1.py &
ncu --set full -f out2 python f2.py &
ncu --set full -f out3 python f3.py
Or use a bash script with --target-processes all
, like
# mybashscript
python f1.py &
python f2.py &
python f3.py
then ncu --set full -o out --target-processes all bash myscript.bash
Best
Max
Are the three models running on a single GPU or across multiple GPUs?
I think for such a use case Nsight Systems will be useful. You could use GPU metrics supported in Nsight Systems.
Hi Sanjiv,
Sorry I didn’t make it clear.
I aim to run these processes on a single device, A100, and I aim to profile the interference on SMs and Memory bandwidth. (If L2 hit rate available, that will be greater.)
First question: Can Nsys help me profile memory usage?
Second question: If I have to use NCU, can NCU profile these processes at the same time?
Best
Tianyu
Yes. The nsys –cuda-memory-usage option can be used to track GPU memory usage by CUDA kernels.
NCU can profile CUDA kernels launched by a application and all its child processes when the –target-processes all option is used. For this you can use a bash script as you mentioned earlier.