Kernel launch overhead for GTX 680

Does anyone know what the kernel launch overhead is for the GTX 680?

I know the GTX 580 is quoted as having an approximately 5us launch overhead …

Launch overhead is a function of hardware platform configuration, software stack configuration including operating system, as well as kernel complexity. I recently measured 3us as the minimum launch overhead (i.e. for empty kernels) for a GTX 680 on an older workstation with PCIe2 running Linux64.

Unless kernel runtime is very small, kernel launch overhead is typically not critical to application performance. If this is a concern for your application I would suggest setting up a quick test. In case you are using a Windows version younger than Windows XP, please note that the WDDM driver model incurs a lot of overhead which the CUDA driver tries to alleviate partially by batching launches. Overhead is much reduced with the TCC driver.


I also get an overhead of approximately 3.4us for an empty kernel launch over 1m runs on a 680.