What's the cost of loading in blocks?

When old block retires, meaning it finishes kernel, and new block is brought in, is there any overhead associated with this loading, and how much?


But the conclusions about the “overhead” in that post I think are incorrect (it is an artifact of inefficient scheduling). See my thoughts on this matter at

Note that the tests in those posts were in particular related to kernels with many blocks that do “almost nothing”.

In kernels where all blocks do a similar amount of work, I have never detected any kind of overhead that I would associate with block scheduling other than a linear overhead of 1.0ms/60000 blocks for the kernel launch (tested with an empty kernel).

Could you please clarify what’s the kernel launch time?

Is it the overhead associated with the first batch of blocks executing that kernel?


No, there is no way to measure the time of execution of the first batch of blocks. What I referred to as the “kernel launch time” was the average time taken to execute an empty kernel with no arguments. This time linearly increases with the number of blocks with a slope of 1.0ms / 60,000 blocks.