OpenCL precompiled kernel faster?


I’m trying to optimize my OpenCL program and to this end I thought of precompiling the OpenCL kernel so that I can use clCreateProgramWithBinary to load the kernel and run the program. Doing that however, I notice no change in execution time. I’m using OpenCL on an Nvidia gtx295 so I’m creating a .ptx file. Is that a naive expectation? Would the precompiled kernel run faster? Or am I missing the point completely?

thanks in advance.


Loading a precompiled kernel only saves you the time of compilation. Taking this into account the OpenCL kernel should run as fast as before. But in general terms, your program (host side) would be faster because it doesn’t need to compile the kernel in each execution.