clCreateKernel has expensive overhead?

I’m seeing a constant 0.013 second overhead for calling clCreateKernel on the Linux NVidia implementation for my Tesla c1060 card. Why is this overhead so expensive?

My MacBook Pro (with an NVidia card) is only seeing a 0.00001 second overhead for doing the same operation. Why is the Linux NVidia driver so much slower in comparison? Is this something that will go away as the driver matures? or do I need to take measures to cache and avoid this performance hit?

Getting a kernel established is really a 2 stage operation, create program & create kernel. Using the sum of both operations when comparing 2 different implementation on different OS’s seems more apples to apples. Some implementations might do differing amounts of work for each operation.

Unless you are constantly creating Kernels, 1/100 of a second doesn’t even sound that bad to do once or twice, especially if the kernel is being executed 100’s or thousands of times. It’s not as good as OSX, but if this is your only problem, it sounds like a good one to have.

clCreateKernel already assumes the program is created and built.

IIRC I’ve observed about 1ms overhead per clCreateKernel in Nvidia’s implementation.

I’m building a service that will be running continuously serving queries. My assumption was that program objects handle all the building and linking. Therefore, I cached program objects to make sure I don’t rebuild upon every query. To me is appeared like kernels are merely containers for a particular section of binary code to execute and to hold the arguments needed by that code. What else could clCreateKernel be doing?

I’m creating roughly 50 kernels per query. My total search time is ~4sec, so roughly %12 of my time is taken up by clCreateKernel. Caching Kernel objects is not preferable since I may need to execute the same program across multiple devices for some queries, and this will require multiple kernel objects. An alternative would be a cheap kernel copy constructor to create new kernels from an existing kernel.