I am having an issue where my cudaGetDeviceCount is taking several (8) seconds. My hardware platform is a Dell R710 with a tesla c2050 running RHEL5.5 (64-bit) with the 3.2 (260.19.26) driver and the 3.2.16 rhel5.5 cuda toolkit.
Section 3.2 of the CUDA C Programming Guide version 3.2 states that “There is no explicit initialization function for the runtime; it initializes the first time a runtime function is called (more specifically any function other than functions from the device and version management sections of the reference manual).” cudaGetDeviceCount is in the Device Managment section of the reference manual. So, I wouldn’t think that this delay would be the runtime initialization this is talking about.
Any ideas about why this is taking so long?
Thanks in advance for any assistance,
My output and cuda code are as follows:
[root@10-0-200-171 ~]# date;./a.out;date
Wed Mar 30 20:12:29 MDT 2011
4 gpus, done
Wed Mar 30 20:12:37 MDT 2011
[root@10-0-200-171 ~]# cat test.cu
printf("%d gpus, done\n",numgpus);