Profiling Constant Cache

Is there any way to check the metrics for constant cache?
ncu-ui provides multiple for L1/L2 but it doesn’t seem to provide any for constant cache

NCU currently does not support capture requests, hits, and misses for the various constant caches.

The Immediate Constant Cache, IMC, is used to access constants when all lanes of the warp request the same constant. IMC accesses can be optimized by looking at the stall metric smsp__warps_issue_stalled_imc_miss in the Warp State Statistics. If Stall IMC Miss is high then open the Source View and add the Column imc_miss. Set the Navigate By: to imc_miss to see the highest lines in the heat map. If misses are high then attempt to spatially locate constants or to limit the number of constants accessed in a kernel.

The Indexed Constant Cache, IDC, is used by the LDC instruction to load a constant where each thread may want to access a different address. The primary counters to review for IDC are:

  • idc__cycles_active.avg.pct_of_peak_sustained_elapsed
  • sm__idc_divergent_instructions - incremented if lanes access different cache lines
  • sm__idc_divergent_instruction_replays - increment for each replay. Misses and address divergence will require additional replays.

The IDC is shared by all SM sub-partitions. If IDC is heavily used (id__cycles_active.avg.pct_of_peak_sustained_elapsed) then the latency to resolve a LDC instruction will increase. This generally shows up as a high smsp__average_warp_latency_issue_stalled_short_scoreboard.ratio. Use the same directions as IMC to find the location is the source view but set the column to stall_short_sb.

If the kernel needs access to 10s of KiBs of constants then consider accessing the data through global memory or loading heavily accessed look up tables into shared memory.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.