Initial delay of 4 seonds on CUDA program before executing on a multi-GPU machine

I have a GPU desktop machine, having 4 Tesla C2050 cards. When executing a CUDA program I notice, that there is a delay of about 4 seconds before my code actually starts running. This delay happens only at the beginning of the program and not during individual kernel launches.

This does not happen on my other machine which has a single GTX 570.

Why could this be happening? I am using the Red-Hat Santiago 6.1 Linux OS on the Tesla machine.

We have seen similar issues but only on linux32 + 285.xx and later drivers. We have filed a bug with NVIDIA and they are currently at work for future driver versions (incident #929288).

–If possible, downgrade to the 27x driver series.