Does anyone know how much it costs to use the clock() function in a kernel?
My reading of the PTX parallel thread exeecution manual (ISA v3.1) suggests
it is very low indeed, essentially a register move. So should be about as fast
as incrementing an int counter.
Is this true if you call clock() at the C++ CUDA level?
The reason for asking is I’m using clock() to detect (and then abort)
infinite loops. Essentially I use (clock() < MAXTICS), where MAXTICS is a large positive
integer (eg 2000000000). This works but calling clock() often introduces an
appreciable overhead (approx doubles kernel time).
As always any help or comments very welcome