Questions about "globaltimer", functionality, accessing and configuring

Looking at the conversation above, I saw use of a “globaltimer” that can be used for timing. Some questions follow:

  1. how can I access this in cuda in C/C++
  2. is this clock synchronized across all SMs. As I understand it clock() and clock64() don’t return a clock value that is from a “synchronized” clock value across the SMs.
  3. at what rate does the timer tick? On x86 it’s ~20MHz and it increments by a clock rate that doesn’t vary with pstate. This is nice because you can use the TSC (time stamp clock) to determine time expired.
  4. can I modify this clock rate in CUDA with some calls to apis, or some other functionality.
  5. The post above says NSIGHT ramps this timer from 1MHz to 30+MHz… which is nice. Is there a way of doing this without NSIGHT.
    Many thanks for any helpful feedback.

1 2

yes

From the previous linked article:

The default resolution is 32ns with update every µs. The NVIDIA performance tools force the update to every 32 ns (or 31.25 MHz).

Not that I know of, barring usage of profiling tools such as CUPTI or the profilers.

Many thanks Robert. I can not at least monitor the freq of operation to some degree of accuracy. A shame you can’t increase the update freq on the globaltimer from 1MHz to 30+.

__clock64() offers much higher resolution, albeit not synchronized across SMs.

See also:
https://nvidia.github.io/cccl/libcudacxx/standard_api/time_library.html

It currently uses %globaltimer.
If you just want to try it without getting into the details, you could use the code below.

#include <cstdint>
#include <cuda/std/chrono>
__device__    inline  int64_t  NS_Clock() {
  auto                          TimeSinceEpoch_ns  =  cuda::std::chrono::duration_cast<cuda::std::chrono::nanoseconds>( cuda::std::chrono::system_clock::now().time_since_epoch() );
  return  static_cast<int64_t>( TimeSinceEpoch_ns.count() );
}