10ms Block each seconds during execution


I’m currently working on a cublas programm who works very well.

However, i’ve profilled a very strange behavior.

During the execution, here is what happen :

920 ms execution | 10 ms block | 60 ms execution | 10 ms block | 920 ms execution … and so on…

It’s an exact period of one seconds. and it’s not code dependant.

The only way to remove these gap in execution is to remove any call to cuda or cublas. I can make my programm execute without running the cublas function and make cuda allocation, and i get the gaps just with a call to cublasInit.

Event a single call to cudaSetDevice ou cudaGetDeviceCount get me the gaps.

I’ve try my code an another machine, and there is no problem.

There is two differences between these machine :

  1. The bugging one is 64 bit, other 32.
  2. The bugging one get two Tesla GPU, the other one Quadro 4000.

Somehow i think this is related to the fact i get two gpu on the bugging machine, but from now i dont have any idea what i can do.

It appears that any calls to cudaSetDevice or cudaGetDeviceCount give us the bugs.

Other call to cublasMalloc cublasCreate … do not create the bug.