How to measure kernel exercution time from kernel itself?

Hi All,

Some of my kernels are rather time-consuming and it’s impossible to predict how much time the kernel will take to execute before the execution.

Say, I want to stop the kernel if it works for more than 15 seconds. I need to calculate the working time in the kernel code and just return from the kernel if this time exceeds the limit.

CUDA SDK provides the clock() function that can be used for measurements, however, it is not clear how to find out the time that all blocks have already taken, not just a single block current thread resides in.

Is it possible to solve it somehow?

Thanks in advance.

You could use atomics such that the first running block stores its starting clock() value to a global memory location → you then have a time base accessible to all subsequent blocks to measure total execution time.

Great idea, actually. Thanks a lot!