NVIDIA Developer Forums

Delay in cuInit

Accelerated Computing CUDA CUDA Programming and Performance

fhellman November 25, 2010, 3:16pm 1

Hi!

I have observed some performance issues with “cold” cuInit invocations:

If you run X, a call to cuInit returns practically immediately.

If you don’t run X, a call to cuInit returns after approximately 1 s, i.e. with a severe delay.

If you don’t run X, but has a program like the following running in the background:

#include <cuda.h>

#include <unistd.h>

int main()

{

   cuInit(0);

   while (1) {

       sleep(1);

   }

   return 0;

}

then a call to cuInit in my foreground application returns immediately.

Hence, there seems to be some system global initialization state that takes seconds to enter. Are there nicer solutions than to run a daemon like the one above in the background to avoid the cuInit delay when not running X?

fhellman November 25, 2010, 3:16pm 2

Hi!

I have observed some performance issues with “cold” cuInit invocations:

If you run X, a call to cuInit returns practically immediately.

If you don’t run X, a call to cuInit returns after approximately 1 s, i.e. with a severe delay.

If you don’t run X, but has a program like the following running in the background:

#include <cuda.h>

#include <unistd.h>

int main()

{

   cuInit(0);

   while (1) {

       sleep(1);

   }

   return 0;

}

then a call to cuInit in my foreground application returns immediately.

Hence, there seems to be some system global initialization state that takes seconds to enter. Are there nicer solutions than to run a daemon like the one above in the background to avoid the cuInit delay when not running X?

avidday November 25, 2010, 3:34pm 3

The driver will unload a lot of state when there is no client connected to it, and it is that the re-establishment of that state which takes the time you are seeing. The recommended solution is to run nvidia-smi in daemon mode with a time cycle of a few seconds. It also has the benefit of forcing the driver to retain stuff like compute exclusivity settings during extended idle times.

avidday November 25, 2010, 3:34pm 4

The driver will unload a lot of state when there is no client connected to it, and it is that the re-establishment of that state which takes the time you are seeing. The recommended solution is to run nvidia-smi in daemon mode with a time cycle of a few seconds. It also has the benefit of forcing the driver to retain stuff like compute exclusivity settings during extended idle times.

Topic		Replies	Views	Activity
cuInit taking a long time? cuInit taking a second CUDA Programming and Performance	6	5423	November 14, 2007
CUDA hangs during cuInit CUDA Setup and Installation	1	842	December 17, 2021
Persistence Daemon and Slow Initialization CUDA Programming and Performance	1	1231	December 18, 2018
Slow CUDA programs' startup CUDA Programming and Performance	10	7472	January 23, 2012
Runtime initialization slow (1 sec) on 400-500 series cards, very slow (5 sec) with CUDA 3.2 CUDA Programming and Performance	5	5694	April 22, 2011
Why does CUDA cuInit() affect Named Pipes latency under Linux Red-Hat ? CUDA Programming and Performance	0	751	April 27, 2009
Why does CUDA cuInit() affect Named Pipes latency under Linux Red-Hat ? CUDA Programming and Performance	0	3349	April 26, 2009
How I can set GPU not to warm up? CUDA Programming and Performance	3	2168	July 25, 2013
cuInit or cudaSetDevice is horribly slow on Kepler K20c, fast on Fermi S2050 CUDA Programming and Performance	4	1955	June 17, 2014
Strange delay on CUDA initialization CUDA Programming and Performance	6	20709	November 30, 2011