cuda startup slow

alex.jn · March 5, 2009, 2:34pm

I found that the first call to cuda library is very slow. I have the code like this

    cutStartTimer(timer4);
cudaMallocPitch((void**)&d_limg, &pitch, wl*sizeof(float), hl);

cutStopTimer(timer4);
float timerPitch = cutGetTimerValue(timer4);
printf("Malloc pitch time: %0.2f ms\n", timerPitch);

cudaMallocPitch((void**)&d_prelimg, &pitch, wl*sizeof(float), hl);
cudaMallocPitch((void**)&d_simg, &spitch, w*sizeof(float), h);

And it takes me 500ms to allocate the first piece memory, however the next two calls don’t have this problem?

Any suggestion?

Thanks

alex.jn · March 6, 2009, 2:11am

Anyone can help?
I am using 8600GT and the first cudaMallocPitch is the first call to cuda library

alex.jn · March 6, 2009, 2:54am

After some googling, it seems that the first call to cuda library will cause some initialization of the context.

But what’s the scope for this initializtion, same as the host process? or the host thread, if it is multi-thread?

Thanks

MisterAnderson42 · March 6, 2009, 12:54pm

Yes. It is only a one time cost at the beginning of the process/thread.

Carsten_Scholtes · March 6, 2009, 3:31pm

You may be able to significantly reduce this initialization time by specifying to nvcc the gpu on which your kernel is to be executed. You can do so by adding -code sm_13 (or whatever your gpu is) to nvcc’s command line. You may have a closer look at e.g. page 16 of nvcc_2.1.pdf in the doc directory beside the bin directory of nvcc.

If you don’t specify -code, apparently (sth. like) ptxas will be invoked when executing the first cuda function, in order to compile and optimize the ptx embedded in your executable for the current gpu. (I just figured this out for a rather large kernel, where omitting -code leads to abortion of the executable after about 3.5 minutes (spent in the first cuda function). With -code the compilation takes about 4.5 minutes but the executable initializes within a few seconds…)

Topic		Replies	Views
First cudaMalloc() takes long time? CUDA Programming and Performance	13	17484	April 23, 2021
cudaMalloc execution time CUDA Programming and Performance	2	130	December 16, 2024
cudaMalloc's taking different times CUDA Programming and Performance	3	1990	December 22, 2010
Is first cudaMalloc() will take more time? then how much? CUDA Programming and Performance	1	2967	April 15, 2009
cudaHostAlloc - very slow the first time CUDA Programming and Performance	2	2967	April 26, 2012
CudaMalloc is taking huge time for first time, How to overcome this issue CUDA Programming and Performance cuda	1	1153	April 12, 2021
Help! First cudaMalloc takes 10 seconds! CUDA Programming and Performance	8	1636	February 11, 2012
Calculate time ? CUDA Programming and Performance	5	2895	November 23, 2008
slowness of first cudaMalloc (K40 vs K20) CUDA Programming and Performance	0	702	October 28, 2015
slowness of first cudaMalloc (K40 vs K20) CUDA Programming and Performance	0	803	October 28, 2015

cuda startup slow

Related topics