Nsight System Profiling two CUDA python(i.e. Pytorch) processes using the same GPU simultaneously

As the title mentioned, when I use nsys to profile two pytorch processes simultaneously within an A100, I find that Graph executing of each process is completely interlaced like the following pic. Can you tell me the reasons? Thank a lot.

@liuyis cna you respond to this?

@lvcunchi Could you share your report file?

I’d also suggest trying a few more profiling options to get more insights:

  1. Try adding the option --cuda-graph-trace=node to see the node-level details of each graph execution.

  2. Try --gpu-metrics-devices=all option to enable GPU metrics sampling feature and observe the GPU metrics during these graph execution.

  3. Try --gpuctxsw=true option to enable the GPU context switch trace - there were probably GPU context switcing happening between the two processes.