slow performance of clGetPlatformIDs


I have previously used CUDA for some simulations with evolutionary algorithms and it worked great. However now I need that simulation to run on multiple devices (ATI and Nvidia) so I decided to rewrite the application with OpenCL.
So far so good, application works fine, but is unbearably slow. With some experimenting I identified the problem. Function clGetPlatformIDs takes more than 13 seconds. This is on a linux (CentOS 6.2 64bit machine, 2.6.32-220.2.1.el6.x86_64 kernel) on windows (7 64bit) it is much, much faster (60-200ms).

The actual computation speed is not a problem, just the initialization…Has anybody met with this strange behavior_? Any hint would be helpful…

Some outputs from clGetPlatformInfo and clGetDeviceInfo:

=== 1 OpenCL platform(s) found: ===
VERSION = OpenCL 1.1 CUDA 4.1.1
VENDOR = NVIDIA Corporation
EXTENSIONS = cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll

=== 1 Device(s) found ===
Device nameTesla C2075
Device versionOpenCL 1.1 CUDA


Hello mad_mouse,

I have the same strange behavior with almost same kernel (Scientific Linux kernel 2.6.32-504.el6.x86_64) with a Tesla M2090 accelerator.

With strace, I can see that the program is waiting for a long time in a call to open /dev/nvidia0 (and a second call to open /dev/nvidia1 with 2 GPUs.)

I have not encountered the problem under Windows 7, and it even works fine if my app is running under VMware (guest is a RHEL 6.4 x64)

I am still searching for a workaround.