I want a device-wide high-resolution measurement of time, not just for performance measurement but also for limiting the run time of some low-priority kernel launches in a real-time system.
I use cuda::std::chrono::system_clock (or high_resolution_clock) to measure time in ns, and it works fine.
But the documentation says:
“To implement
std::chrono::system_clock
, we use… PTX’s%globaltimer
for device code.”
The documentation for %globaltimer says:
“Special registers intended for use by NVIDIA tools. The behavior is target-specific and may change or be removed in future GPUs. When JIT-compiled to other targets, the value of these registers is unspecified.”
So:
Is it safe to rely on cuda::std::chrono::system_clock, given the documentation for %globaltimer?
I.e., is cuda::std::chrono::system_clock likely to disappear or change behavior in the future?