Hi All,
I’m coding on CUDA 2.3 and found a little problem with function: cudaSetDevice. I did a profiler and list result in following:
Calls % Incl % Excl Depth Function Module Incl Time Excl Time
79,862 14.67 0.01 9 CuSubSetotprob HERest.OffInst.exe 8,256,686,016 6,742,320
79,862 1.42 0.00 10 RtlVirtualUnwind + 3 HERest.OffInst.exe 797,731,140 1,210,080
79,862 1.42 1.42 11 cudaConfigureCall cudart.dll 796,521,060 796,521,060
79,862 11.37 0.02 10 _my_device_function HERest.OffInst.exe 6,398,619,732 8,985,072
878,482 3.17 0.00 11 _cudaRegisterFunction + 3 HERest.OffInst.exe 1,784,407,788 0
79,862 8.18 0.01 11 cudaSetDevice + 3 HERest.OffInst.exe 4,605,226,872 3,018,540
159,724 1.87 0.01 10 _cudaUnregisterFatBinary HERest.OffInst.exe 1,053,592,824 3,624,420
You can see cudaSetDevice will be called in each device function and ocuppied 8.18% CPU time in whole program:
…
_my_device_function → _cudaRegisterFunction // will be called everytime but reasonable
_my_device_function → cudaSetDevice // will be called everytime and unreasonable
…
In fact, I already called cudaSetDevice in very beginning of my program. So I think these 8.18% is useless. Anyone can give me an answer on this?