I have a use-case where I run cuda kernels on two gpus from the same process at the same time in parallel. When profiling with CUPTI, can i use say CuptiActivitityKernel4.deviceId and CuptiActivityMemcpy2.deviceId to figure out which GPU the data is coming from?
Is deviceId available on all applicable cupti structs?
If not what should be used?
Also what are something to know if using cupti for profiling two gpus at the same time from the same process?