I’m new in CUDA and trying to use clock64()/clock_rate to record the start time of each block in different SM. The result shows that the gap of launching time between some blocks is huge. So I get a conclusion that different SMs of gpu use different clock. Is it right? And, is there a unified clock to control different SMs? My device is Tesla V100.
In all likelihood, this is due to there being no guarantee around thread ordering. See Robert’s answer in this recent post.
In addition, the numerical value returned by clock64()
is not intended to be comparable from one SM to another. Even if two blocks, on two distinct SMs, happened to read the clock64()
at the same instant, there is no guarantee the returned values would be the same or close to each other.
You may wish to investigate using the globaltimer.
And, as already indicated by rs277, without further information, its entirely possible
1 Like
Thanks all for indication.