I am reading the code of CUPTI concurrent_profiling
sample, and I get some confusions. The function ProfileKernels
takes &deviceData
as one of the parameters, and RUNTIME_API_CALL(cudaSetDevice(deviceData.deviceID))
is called at the beginning of this function.
My question is that, is deviceData.deviceID
(instead of deviceData.config.deviceID
, which is properly initialized) actually initialized to zero by the constructor of vector<PerDeviceData> deviceData(numDevices)
? I cannot find any further modification of deviceData[i].deviceID
; if so, does that mean the test on multiple devices will unexpectedly run on the device 0 only? I appreciate any help.