First CUDA call takes 13 seconds

wbolden · July 1, 2015, 10:48pm

I set up CUDA on linux after upgrading to a new graphics card (980 ti) and now the first CUDA call of any program takes around 13 seconds to complete on Ubuntu 14.10 (and I have around 100% CPU usage).

I ran the following code:

for(int i =0; i < 2; i++)
{
   auto start = std::chrono::high_resolution_clock::now();
   cudaDeviceSynchronize();
   auto elapsed = std::chrono::high_resolution_clock::now() - start;
   printf("%ld microseconds\n", std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count());
}

Which produced these outputs for the first and second calls respectively:
(Ubuntu 14.10 with 352.21 driver)

13371595 microseconds
3 microseconds

(Windows 8.1 with 353.30 driver)

287011 microseconds
0 microseconds

Note this problem does not only occur with cudaDeviceSynchronize. I also tried an empty kernel call and a cudaMalloc call to similar results. This delay also occurs when running the nvidia-smi command.

I found some old threads (https://devtalk.nvidia.com/default/topic/480579/slow-cuda-programs-39-startup/ and https://devtalk.nvidia.com/default/topic/696488/first-cuda-function-call-very-slow-more-than-a-minute-on-gtx-680-only/?offset=4) mentioning similar problems but none of the solutions discussed there fixed the problem in my case.

Does anyone know how I might solve this?

Robert_Crovella · July 1, 2015, 11:07pm

If you have a lot of system memory and multiple GPUs, the VM initialization time incurred by the GPU driver as it is starting up the CUDA runtime can be significant.

[url]c++ - Why does my "Hello world" program take almost 10s? - Stack Overflow

If you have multiple GPUs, try using CUDA_VISIBLE_DEVICES environment variable to limit your test to a single GPU.

I don’t know if you’re able to set persistence mode on the 980 Ti (I think not.) If you can, it might help a bit.

If you set up an X-server on the 980 Ti (probably not optimal for a number of other reasons) I would expect this delay to mostly go away. On windows, the 980 Ti GPU is in WDDM mode which means it is awake and ready to go all the time - thus no VM startup delays.

(VM = Virtual Memory, as in UVM Unified Virtual Memory)

wbolden · July 1, 2015, 11:54pm

Thanks for the quick response.

I do have a second GPU in the system (a 660 ti), however the only the 980 ti is detected. I have 16GB of system memory, though I never had any initialization problems on my 660 ti which also supported UVM.

I enabled persistence mode earlier, though sadly it didn’t provide any performance gains.

I do have an X-server running on the 980 Ti.

Robert_Crovella · July 2, 2015, 12:54am

The 660 Ti is not detected? That is quite odd. Not sure what you mean by that.

With the 352.21 driver, if you run nvidia-smi, the 660 Ti is plugged into that system but nvidia-smi doesn’t list it?

wbolden · July 2, 2015, 2:18am

To clarify, the 660 ti is identified by lspci but nvidia-smi does not list it as one of the system’s GPUs.

Robert_Crovella · July 2, 2015, 2:21am

I would suggest investigating that. In fact, if your display is running on the GTX 980 Ti, just remove the 660 Ti from the system if it is not functional. It may be causing unknown problems. You should also make sure that the nouveau driver has been properly removed from the system. That is covered in the linux getting started guide.

wbolden · July 2, 2015, 2:40am

Nouveau was the issue. Disabling it fixed the startup time issue and my 660 ti is now recognized. Thanks for the help.

Topic		Replies	Views
Slow CUDA programs' startup CUDA Programming and Performance	10	7367	January 23, 2012
Strange delay on CUDA initialization CUDA Programming and Performance	6	20661	November 30, 2011
Performance first execution First execution very very very slow, next execution OK CUDA Programming and Performance	3	3013	October 17, 2009
Help! First cudaMalloc takes 10 seconds! CUDA Programming and Performance	8	1585	February 11, 2012
Runtime initialization slow (1 sec) on 400-500 series cards, very slow (5 sec) with CUDA 3.2 CUDA Programming and Performance	5	5645	April 22, 2011
Time measurement, callbacks, and IPC CUDA Programming and Performance	7	18608	July 17, 2007
HELP: cuda runtime initialization takes up to minutes CUDA Programming and Performance	2	7867	June 21, 2011
cuda device initialization very slow in ubuntu 8.04 with new driver different driver / card combos t CUDA Programming and Performance	0	12081	December 17, 2010
why any CUDA program takes more than 1s? driver initialization time? CUDA Programming and Performance	7	3463	March 25, 2009
Long initialization time C1060 CUDA Programming and Performance	3	1194	August 6, 2009

First CUDA call takes 13 seconds

Related topics