I am coding a simple multi-GPU scheduler, using APIs from nvml.dll.
Can double check, when will “nvmlDeviceGetComputeRunningProcesses” tell there is a process running?
or, from 1st cudaMalloc?
or, from 1st kernel call?
From my testing it seems to be 1st cudaMalloc, but I can’t find any API document regarding this.