Questions about "globaltimer", functionality, accessing and configuring

tjwilkens · August 22, 2024, 4:04pm

Looking at the conversation above, I saw use of a “globaltimer” that can be used for timing. Some questions follow:

how can I access this in cuda in C/C++
is this clock synchronized across all SMs. As I understand it clock() and clock64() don’t return a clock value that is from a “synchronized” clock value across the SMs.
at what rate does the timer tick? On x86 it’s ~20MHz and it increments by a clock rate that doesn’t vary with pstate. This is nice because you can use the TSC (time stamp clock) to determine time expired.
can I modify this clock rate in CUDA with some calls to apis, or some other functionality.
The post above says NSIGHT ramps this timer from 1MHz to 30+MHz… which is nice. Is there a way of doing this without NSIGHT.
Many thanks for any helpful feedback.

Robert_Crovella · August 22, 2024, 5:01pm

yes

From the previous linked article:

The default resolution is 32ns with update every µs. The NVIDIA performance tools force the update to every 32 ns (or 31.25 MHz).

Not that I know of, barring usage of profiling tools such as CUPTI or the profilers.

tjwilkens · August 23, 2024, 8:40pm

Many thanks Robert. I can not at least monitor the freq of operation to some degree of accuracy. A shame you can’t increase the update freq on the globaltimer from 1MHz to 30+.

Robert_Crovella · August 23, 2024, 10:18pm

__clock64() offers much higher resolution, albeit not synchronized across SMs.

CU_Steve · August 23, 2024, 11:28pm

See also:
https://nvidia.github.io/cccl/libcudacxx/standard_api/time_library.html

It currently uses %globaltimer.
If you just want to try it without getting into the details, you could use the code below.

#include <cstdint>
#include <cuda/std/chrono>
__device__    inline  int64_t  NS_Clock() {
  auto                          TimeSinceEpoch_ns  =  cuda::std::chrono::duration_cast<cuda::std::chrono::nanoseconds>( cuda::std::chrono::system_clock::now().time_since_epoch() );
  return  static_cast<int64_t>( TimeSinceEpoch_ns.count() );
}

Topic		Replies	Views
%globaltimer update frequency CUDA Programming and Performance	0	381	July 22, 2022
Measuring Execution Time Inside a GPU Kernel Nsight Compute cuda , nsight	2	1217	January 23, 2024
Question about Nsight Tools Timestamping Mechanism and Clock Sources Nsight Compute	1	43	February 17, 2025
How to get the exec. time inner the kernel function? Nsight Compute cuda , kernel , profiling	6	975	February 27, 2023
Number of GPU clock cycles CUDA Programming and Performance	15	10192	June 16, 2017
Is there a function like cpu_time for GPUs, please nvc, nvc++ and nvfortran	2	345	October 26, 2022
Timing inside the kernel How to measure times inside the kernel? CUDA Programming and Performance	10	12010	December 21, 2009
Nsight Compute slows down Tesla T4 processor clock during profiling Nsight Compute	5	802	October 12, 2021
Any hardware performance counters for number of cores/SMs occupied? CUDA Programming and Performance	2	1094	January 20, 2020
time measurement discrepancy timer, clock(), profiling CUDA Programming and Performance	4	6693	April 7, 2010

Questions about "globaltimer", functionality, accessing and configuring

Related topics