cudaSetDevice() too slow

How much time we pay for cudaSetDevice() function.
In my program, cudaSetDevice() takes me 75.214(ms).
It is too slow or I did some thing wrong?
Thank you.

cudaSetDevice() should only be called once… it should not be on the critical path, so you shouldn’t care if it is slow. Is there some reason it matters?