I have previously used CUDA for some simulations with evolutionary algorithms and it worked great. However now I need that simulation to run on multiple devices (ATI and Nvidia) so I decided to rewrite the application with OpenCL.
So far so good, application works fine, but is unbearably slow. With some experimenting I identified the problem. Function clGetPlatformIDs takes more than 13 seconds. This is on a linux (CentOS 6.2 64bit machine, 2.6.32-220.2.1.el6.x86_64 kernel) on windows (7 64bit) it is much, much faster (60-200ms).
The actual computation speed is not a problem, just the initialization…Has anybody met with this strange behavior_? Any hint would be helpful…
Some outputs from clGetPlatformInfo and clGetDeviceInfo:
=== 1 OpenCL platform(s) found: ===
PROFILE = FULL_PROFILE
VERSION = OpenCL 1.1 CUDA 4.1.1
NAME = NVIDIA CUDA
VENDOR = NVIDIA Corporation
EXTENSIONS = cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
=== 1 Device(s) found ===
Device nameTesla C2075
Device versionOpenCL 1.1 CUDA