HELP: cuda runtime initialization takes up to minutes

spy · June 16, 2011, 1:07pm

Hello,

I am experiencing a really annoying problem concerning my development of cuda ray tracer for non-linear ray tracing.

Up to now i was using a GTX 275 and cuda toolkit 3.2 on Ubuntu Lucid 10.10 and everything went fine. The time needed for a startup of the application take approximately 1 till 3 seconds
including memory transfer and allocation of several hundreds of MBs (and additional OpenGL initialization, All cards are bound to an Xserver).

For testing purposes i switched to 2 newer GTX 580 systems (the speedup for calculation is incredible) running Ubuntu and Fedora respectively and the cuda toolkit 3.2 as well.

On these systems the application gets stuck at the creation of the cuda context for several minutes. After a real long period of time the application suddenly returns and runs further as usual. First i thought this has something to do with allocation and copying image data to texture memory, but after some time of searching for the problem i found out, that the waiting time is related to the first call to a cuda runtime function that needs the cuda context and therefore initializes it.

Running cudaSetDevice() and cudaThreadExit() as first calls to the runtime library get executed really quickly but calling cudaFree(0) or cudaThreadSynchronize() first causes this long time being stucked in whatever CUDA is doing internally. After taking a cup of tee the application is up and running. On the Fedora system quitting and executing again does not suffer from this long period of waiting but recompiling the application before executing brings back the problem.
Curiously this behavor does not occur on the GTX275 so i think it has to be a driver issue. I should have mentioned that my executable is really large (~9MB in size) but it does fine on the GTX275 and on following startups on the GTX580.

Does anybody has a clue, what is going on there - or is anybody experiencing the same behavor ?

kind regards,
daniel

spy · June 16, 2011, 1:07pm

Hello,

I am experiencing a really annoying problem concerning my development of cuda ray tracer for non-linear ray tracing.

Up to now i was using a GTX 275 and cuda toolkit 3.2 on Ubuntu Lucid 10.10 and everything went fine. The time needed for a startup of the application take approximately 1 till 3 seconds
including memory transfer and allocation of several hundreds of MBs (and additional OpenGL initialization, All cards are bound to an Xserver).

For testing purposes i switched to 2 newer GTX 580 systems (the speedup for calculation is incredible) running Ubuntu and Fedora respectively and the cuda toolkit 3.2 as well.

On these systems the application gets stuck at the creation of the cuda context for several minutes. After a real long period of time the application suddenly returns and runs further as usual. First i thought this has something to do with allocation and copying image data to texture memory, but after some time of searching for the problem i found out, that the waiting time is related to the first call to a cuda runtime function that needs the cuda context and therefore initializes it.

Running cudaSetDevice() and cudaThreadExit() as first calls to the runtime library get executed really quickly but calling cudaFree(0) or cudaThreadSynchronize() first causes this long time being stucked in whatever CUDA is doing internally. After taking a cup of tee the application is up and running. On the Fedora system quitting and executing again does not suffer from this long period of waiting but recompiling the application before executing brings back the problem.
Curiously this behavor does not occur on the GTX275 so i think it has to be a driver issue. I should have mentioned that my executable is really large (~9MB in size) but it does fine on the GTX275 and on following startups on the GTX580.

Does anybody has a clue, what is going on there - or is anybody experiencing the same behavor ?

kind regards,
daniel

spy · June 21, 2011, 9:01pm

Ok for everybody who’s experiencing a long startup time in cuda application too, i found the reason for my case !

The long time to start the application was spend to compile the embedded virtual ptx code (-code=compute_13) to binary gpu-code ( see just-in-time compilation in the nvcc manual ).
I wasn’t aware to the fact ( shame on me ! ) that GTX 4xx cards and later are already based on the fermi architecture, which needs code for compute capability 2.0. I adviced nvcc to only generate cubin code for sm_13:

-gencode=arch=compute_13,code="compute_13,sm_13"

the “compute_13” advices nvcc to embed ptx code into the executable, the “sm_13” specifies directly executable machine code.

On the older GTX 275 the sm_13 code was adequate for direct execution on its Gpu.
The newer GTX 580 needs to have sm_20 embedded, but the driver is able to compile the virtual ptx code “just-in-time”, which actually was “just-a-long-time” in my case …
Onces the code is jit-compiled the driver holds the binaries in its cache so further startups are as fast as usual.

Topic		Replies	Views
Runtime initialization slow (1 sec) on 400-500 series cards, very slow (5 sec) with CUDA 3.2 CUDA Programming and Performance	5	5618	April 22, 2011
Slow Initialization CUDA Programming and Performance	7	2744	July 30, 2009
5000ms for warm up? CUDA Programming and Performance	2	1778	April 6, 2009
First CUDA call takes 13 seconds CUDA Programming and Performance	6	4368	July 2, 2015
First CUDA function call very slow (more than a minute) on GTX 680 only CUDA Programming and Performance	4	7075	February 27, 2014
Initialization time on GTX 460 CUDA Programming and Performance	17	8609	November 9, 2011
cuInit taking a long time? cuInit taking a second CUDA Programming and Performance	6	5339	November 14, 2007
Slow CUDA programs' startup CUDA Programming and Performance	10	7323	January 23, 2012
cudaMalloc hangs for several minutes on Titans on CentOS5_x64 CUDA Setup and Installation	6	3660	June 12, 2013
cuda device initialization very slow in ubuntu 8.04 with new driver different driver / card combos t CUDA Programming and Performance	0	12065	December 17, 2010

HELP: cuda runtime initialization takes up to minutes

Related topics