cudaSetDevice in each device function call?

Hi All,

I’m coding on CUDA 2.3 and found a little problem with function: cudaSetDevice. I did a profiler and list result in following:

Calls % Incl % Excl Depth Function Module Incl Time Excl Time

79,862 14.67 0.01 9 CuSubSetotprob HERest.OffInst.exe 8,256,686,016 6,742,320

79,862 1.42 0.00 10 RtlVirtualUnwind + 3 HERest.OffInst.exe 797,731,140 1,210,080

79,862 1.42 1.42 11 cudaConfigureCall cudart.dll 796,521,060 796,521,060

79,862 11.37 0.02 10 _my_device_function HERest.OffInst.exe 6,398,619,732 8,985,072

878,482 3.17 0.00 11 _cudaRegisterFunction + 3 HERest.OffInst.exe 1,784,407,788 0

79,862 8.18 0.01 11 cudaSetDevice + 3 HERest.OffInst.exe 4,605,226,872 3,018,540

159,724 1.87 0.01 10 _cudaUnregisterFatBinary HERest.OffInst.exe 1,053,592,824 3,624,420

You can see cudaSetDevice will be called in each device function and ocuppied 8.18% CPU time in whole program:

_my_device_function -> _cudaRegisterFunction // will be called everytime but reasonable

_my_device_function -> cudaSetDevice // will be called everytime and unreasonable

In fact, I already called cudaSetDevice in very beginning of my program. So I think these 8.18% is useless. Anyone can give me an answer on this?

Any idea for this?

It’s likely that that SetDevice call was just initializing the whole GPU context. If you didn’t use it, the next cuda function would do the init, and THAT would look slow.

Hi SPWorley,

Base on my common programming experience, initalizing calls always happen few times. But in my profiling, cudaSetDevice will be called in each function call. Does it mean CUDA need to initialze GPU context every time? If we can avoid it?