I am experiencing a really annoying problem concerning my development of cuda ray tracer for non-linear ray tracing.
Up to now i was using a GTX 275 and cuda toolkit 3.2 on Ubuntu Lucid 10.10 and everything went fine. The time needed for a startup of the application take approximately 1 till 3 seconds
including memory transfer and allocation of several hundreds of MBs (and additional OpenGL initialization, All cards are bound to an Xserver).
For testing purposes i switched to 2 newer GTX 580 systems (the speedup for calculation is incredible) running Ubuntu and Fedora respectively and the cuda toolkit 3.2 as well.
On these systems the application gets stuck at the creation of the cuda context for several minutes. After a real long period of time the application suddenly returns and runs further as usual. First i thought this has something to do with allocation and copying image data to texture memory, but after some time of searching for the problem i found out, that the waiting time is related to the first call to a cuda runtime function that needs the cuda context and therefore initializes it.
Running cudaSetDevice() and cudaThreadExit() as first calls to the runtime library get executed really quickly but calling cudaFree(0) or cudaThreadSynchronize() first causes this long time being stucked in whatever CUDA is doing internally. After taking a cup of tee the application is up and running. On the Fedora system quitting and executing again does not suffer from this long period of waiting but recompiling the application before executing brings back the problem.
Curiously this behavor does not occur on the GTX275 so i think it has to be a driver issue. I should have mentioned that my executable is really large (~9MB in size) but it does fine on the GTX275 and on following startups on the GTX580.
Does anybody has a clue, what is going on there - or is anybody experiencing the same behavor ?