clBuildProgram performance bottleneck clBuildProgram execute noticable slowly on GPUs implementation

I build a simple OpenCL application to tested out on GeForce GTX 295 and GeForce GT 8400GS and I noticed that OpenCL has a serious bottleneck on clBuilProgram up to 3 seconds on GTX 295 and 25 seconds on GT8400GS.

My questions are:

  1. Is it normal that the clBuildProgram run that slowly on GPUs implementation?
  2. What will affect the clBuilProgram execute time?
  3. Is there any way to improve the bottleneck on clBuildProgram?
  4. I try to run my code using multi-thread to load and run on both GPUs on GTX295 (I disable the multi-GPU this time) at the same time. The clBuilProgram increase double (6 seconds). Is this problem cause by both GPUs share the same bus on PCIe? Or other limitation causes that?

Sorry for my bad English and grammar. External Image

Thank you. External Image

I noticed the same thing. Though only on Windows, the CL compile times on the SnowLeopard MacOS implementation don’t seem nearly as bad.

Did you try pre-compiling your code and building using clCreateProgramWithBinary rather than clCreateProgramWithSource ? Additionally have you tried using the pfn_notify parameter to compile asynchronously, I’ve not used it myself, but at least it means you can set a bunch of them compiling at once (in theory at least).