I have a GPU desktop machine, having 4 Tesla C2050 cards. When executing a CUDA program I notice, that there is a delay of about 4 seconds before my code actually starts running. This delay happens only at the beginning of the program and not during individual kernel launches.
This does not happen on my other machine which has a single GTX 570.
Why could this be happening? I am using the Red-Hat Santiago 6.1 Linux OS on the Tesla machine.