really slow cudaGetDeviceCount() several seconds to complete a cudaGetDeviceCount() call

joewilhelmi · March 31, 2011, 2:53am

Hello,

I am having an issue where my cudaGetDeviceCount is taking several (8) seconds. My hardware platform is a Dell R710 with a tesla c2050 running RHEL5.5 (64-bit) with the 3.2 (260.19.26) driver and the 3.2.16 rhel5.5 cuda toolkit.

Section 3.2 of the CUDA C Programming Guide version 3.2 states that “There is no explicit initialization function for the runtime; it initializes the first time a runtime function is called (more specifically any function other than functions from the device and version management sections of the reference manual).” cudaGetDeviceCount is in the Device Managment section of the reference manual. So, I wouldn’t think that this delay would be the runtime initialization this is talking about.
Any ideas about why this is taking so long?

Thanks in advance for any assistance,
Joe

My output and cuda code are as follows:

[root@10-0-200-171 ~]# date;./a.out;date
Wed Mar 30 20:12:29 MDT 2011
starting
4 gpus, done
Wed Mar 30 20:12:37 MDT 2011
[root@10-0-200-171 ~]# cat test.cu
#include <stdio.h>

int main()
{
int numgpus=0;
printf(“starting\n”);
cudaGetDeviceCount(&numgpus);
printf(“%d gpus, done\n”,numgpus);
return 0;
}

avidday · March 31, 2011, 6:33am

Try running nvidia-smi in loop mode with loop interval of 10 seconds as a background process and see if it improves. The nvidia kernel driver unloads a lot of code and state if there is no client connected to it (normally X11, but a user application or nvidia-smi also act do the same thing). The long time you are seeing is probably the time taken for the driver to reload itself and then initialize the card. By keeping nvidia-smi running, the driver won’t unload between user code runs.

joewilhelmi · March 31, 2011, 10:14pm

Thats it! I ran nvidia-smi in the background and my test runs in less than a second. Thanks.

Philip_perigen · May 18, 2011, 8:27pm

Hello,

I am having an issue where my cudaGetDeviceCount is taking several (8) seconds. My hardware platform is a Dell R710 with a tesla c2050 running RHEL5.5 (64-bit) with the 3.2 (260.19.26) driver and the 3.2.16 rhel5.5 cuda toolkit.

Section 3.2 of the CUDA C Programming Guide version 3.2 states that “There is no explicit initialization function for the runtime; it initializes the first time a runtime function is called (more specifically any function other than functions from the device and version management sections of the reference manual).” cudaGetDeviceCount is in the Device Managment section of the reference manual. So, I wouldn’t think that this delay would be the runtime initialization this is talking about.

Any ideas about why this is taking so long?

Thanks in advance for any assistance,

Joe

My output and cuda code are as follows:

[root@10-0-200-171 ~]# date;./a.out;date

Wed Mar 30 20:12:29 MDT 2011

starting

4 gpus, done

Wed Mar 30 20:12:37 MDT 2011

[root@10-0-200-171 ~]# cat test.cu

include <stdio.h>

int main()

{
int numgpus=0;

printf("starting\n");

cudaGetDeviceCount(&numgpus);

printf("%d gpus, done\n",numgpus);

return 0;
}

Hello Joe,

I just bought the same hardware as you describe: Dell R710 and Tesla C2050. But it’s not at all obvious how to connect the NVidia card to the Server. The standard Dell PCI Express x16 Gen 2 riser card has the wrong orientation (meant for single slot cards). Then there’s the problem of

power supply: the riser card will give a grand total of 25W, while the Tesla requires ~270W!

But presumably you know all this and have solved it if you’re worried about getting your CUDA software layer running.

Any hints/stories would be welcome.

Thanks,

Phil

Topic		Replies	Views
Slow CUDA programs' startup CUDA Programming and Performance	10	7265	January 23, 2012
cuda initialization takes too much time CUDA Programming and Performance	5	2538	August 27, 2017
Runtime initialization slow (1 sec) on 400-500 series cards, very slow (5 sec) with CUDA 3.2 CUDA Programming and Performance	5	5593	April 22, 2011
Device initialization takes 60 Seconds CUDA Programming and Performance	7	598	July 24, 2023
Multi-gpu computation, context creation overhead ? CUDA Programming and Performance	13	3316	June 24, 2011
cudaSetDevice() time, so weird! cudaSetDevice() take a long time. CUDA Programming and Performance	10	4588	August 2, 2010
Help required in Invalid device ordinal CUDA Programming and Performance	6	7999	March 10, 2012
deviceQuery reports: cudaGetDeviceCount returned 10 -> invalid device ordinal / test results... F CUDA Programming and Performance	1	3567	July 2, 2013
CUDA initialization takes long time that varies up to 30 seconds on Amazon p3.16xlarge Windows machi... CUDA Programming and Performance	5	1590	December 8, 2019
CUDA hangs during cuInit CUDA Setup and Installation	1	755	December 17, 2021

really slow cudaGetDeviceCount() several seconds to complete a cudaGetDeviceCount() call

Related topics