I have a .cl files with 102 kernels.
I am on windows XP with drivers 258.19 and OpenCL 1.1 (same kind of performance problems with other driver)
If i use 102 createKernel or if i use clCreateKernelsInPrograms those two methods takes 9 sec…
Of course, i use static kernel so the time of creation appears only the first evaluation.
Do you have any idea to reduce this time ? it is a big problem to wait each time 9 sec when i debug…
In relation to a different thread, could you explain what you mean by "i use static kernel so the time of creation appears only the first evaluation"? I am very interested on how to do this. Does this have something to do with clCreateProgramWithBinary?
If you have a kernel for example VectorAdd (C = A + B ) and in your code you need to call it 2 times.
It’s not necessary to make a clCreatekernel at each call, i just create the kernel at the first call and i put it in a static variable. Consequently in the second call, it is not necessary to recreate the kernel.
Thank you for the reply. I have a follow-up question, but I will ask in a different thread, since I do not want to change the focus of your original topic.
Up for my problem of performance with clCreateKernel ;)