I build a simple OpenCL application to tested out on GeForce GTX 295 and GeForce GT 8400GS and I noticed that OpenCL has a serious bottleneck on clBuilProgram up to 3 seconds on GTX 295 and 25 seconds on GT8400GS.
My questions are:
- Is it normal that the clBuildProgram run that slowly on GPUs implementation?
- What will affect the clBuilProgram execute time?
- Is there any way to improve the bottleneck on clBuildProgram?
- I try to run my code using multi-thread to load and run on both GPUs on GTX295 (I disable the multi-GPU this time) at the same time. The clBuilProgram increase double (6 seconds). Is this problem cause by both GPUs share the same bus on PCIe? Or other limitation causes that?
Sorry for my bad English and grammar.