What's the difference between metrics of with "_realtime" and without "_realtime"?

I found some metrics so similar in the NVIDIA Nsight Compute install directory, such as:

lts__t_requests_srcnode_gpc_realtime & lts__t_requests_srcnode_gpc,

lts__t_requests_srcunit_ce_realtime & lts__t_requests_srcunit_ce,

lts__t_requests_srcunit_gcc_realtime & lts__t_requests_srcunit_gcc,

sm__pipe_fma_cycles_active_realtime & sm__pipe_fma_cycles_active,

sm__pipe_tensor_cycles_active_realtime & sm__pipe_tensor_cycles_active,

what’s the meaning of the suffix “_realtime” ?

real-time indicates the metric is optimized for single pass collection. In many cases the counters will be slightly less accurate. For example. The lts_t_sectors* raw counters increment by 0-4 per cycle. The realtime variants have an internal 2bit accumulator and output a 1 to the counter when the internal accumulator rolls over. For .avg this means a ± 0-3 error for a sample period. For .sum the error can be larger as the error is per L2 slice. In most cases this has no impact on analysis unless someone is writing a very small micro benchmark and expecting deterministic results.

The reason we define_realtime is it allows the tool to collect may more counters in a single pass as the counting hardware in has to support and increment by one operation whereas lts__t_sectors_srcunit_tex_op_read_aperture_sysmem l_lookup_hit requires at least 3-bit predicate for src_unit & op==read && aperture==sysmem and a 3-bit increment.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.