Is there any way to analyze the concurrency of kernels on a gpu device?
When I look at the kernel’s time stamp, it seems that the device first runs KERNEL1 and then KERNEL2. So that means KERNEL1 takes all SMs and then KERNEL2 takes all SMs.
I would like to know with the MPS feature turned on, is there a way to check which kernels are offloaded to the device at a given time? I mean, in the final report, I see that the launch time of KERNEL1 and KERNEL2 are 1.32243 for example.
I haven’t found an answer for that. Any thought?