The first CUDA function I call (after cudaSetDevice) takes about 65 seconds to run on a GTX 680, but only 400 ms on a GTX 580. I’ve found this to be true on several computing systems, with different GPUs.
I read at http://stackoverflow.com/questions/15166799/any-particular-function-to-initialize-gpu-other-than-the-first-cudamalloc-call that you can call cudaFree(0) to force CUDA to do its initialization. However, this is just as slow: this cudaFree(0) call takes more than a minute to run on the GTX 680. During this minute, the process is at near 100% CPU usage.
Is this slow initialization a known problem with the GTX 680? Is there a way to speed up the CUDA initialization? I’ve tried this with both CUDA 5.0 and 5.5.