Runtime initialization slow (1 sec) on 400-500 series cards, very slow (5 sec) with CUDA 3.2

mastcu · April 6, 2011, 5:43am

I noticed that the real time used by my program for short computations was longer on some new GPUs than the CPU. I isolated the delay to the runtime initialization associated with the first memory allocation. My initial device query is fast, but the initialization, which can be triggered with a cudaFree(0) as others have suggested, is slow. This happens when the process is run in rapid succession and running nvidia-smi makes no difference.

By slow I mean over a second, not the dozens of milliseconds others have reported. Here are some measurements of initialization time under RedHat Enterprise 5:
1.55 sec GTX 460 driver 195.36.31
1.49 sec GT 420 driver 260.19.44
1.43 sec GTX 580 driver 260.19.44

7 older cards (3 GTX 275, GTX 285, FX 3800, Geforce 210 and 250), all running driver 195.36.31, have initialization times ranging from 0.05 to 0.15 sec.

This was all with CUDA 3.0. CUDA 3.1 gives similar results. But it gets worse with CUDA 3.2, the time is 5.5 seconds with the GTX 580.

Also, bandwidthTest takes 4.5 seconds to run on the GTX 580 with CUDA 3.0 and less than a second on the older cards.

All these times seem pathologically long. Are others seeing such times? Could there be something special (security-related?) about our systems that is making this happen?

avidday · April 6, 2011, 6:04am

Are you running X11 on these cards? If not, the time you are seeing is driver and card initialisation time. The NVIDIA linux driver unloads itself when there are no client connections to it. If you are not running X11, try running nvidia-smi in a loop with a loop time of 20 seconds in the background. That polling from nvidia-smi should stop the driver unloading.

Of course, if you are running X11 using the NVIDIA driver, then it must be something else.

Christopher_Cameron · April 6, 2011, 6:56am

A number of slow initialization bugs (particularly for 64-bit Linux with Fermi cards, first appearing with 64-bit support in CUDA 3.2) have been fixed in the most recent driver, 270.35. Please try with that driver or any newer one.

mastcu · April 9, 2011, 9:25pm

Thanks. The highest available beta version on the regular download site is 270.26. I just tried that and the device allocation step jumped up to 4 seconds and the initialization is still a second. I’ll look forward to trying 270.35 or higher when it at least reaches beta status.

tmurray · April 9, 2011, 10:25pm

270.40 is the RC2 driver:

mastcu · April 22, 2011, 2:43pm

270.41.06 is now released as a recommended driver and it solves the problems that I described.

Topic		Replies	Views
Strange delay on CUDA initialization CUDA Programming and Performance	6	20590	November 30, 2011
Persistence Daemon and Slow Initialization CUDA Programming and Performance	1	1082	December 18, 2018
[ runtime initialization very slow as gpu count increases, linux ] CUDA Programming and Performance	1	7379	April 7, 2011
Device initialization takes 60 Seconds CUDA Programming and Performance	7	432	July 24, 2023
Slow CUDA programs' startup CUDA Programming and Performance	10	7237	January 23, 2012
Slow Initialization CUDA Programming and Performance	7	2693	July 30, 2009
why any CUDA program takes more than 1s? driver initialization time? CUDA Programming and Performance	7	3395	March 25, 2009
really slow cudaGetDeviceCount() several seconds to complete a cudaGetDeviceCount() call CUDA Programming and Performance	3	1186	May 18, 2011
Initialization time on GTX 460 CUDA Programming and Performance	17	8575	November 9, 2011
First CUDA function call very slow (more than a minute) on GTX 680 only CUDA Programming and Performance	4	7026	February 27, 2014

Runtime initialization slow (1 sec) on 400-500 series cards, very slow (5 sec) with CUDA 3.2

Related Topics