Nsight Warp Occupancy

I have profiled a shader in Nsight, and the SM Warp Occupancy is like in the image below.

The top one, stalled register allocations as I understand it, is that a shader is using too many registers, so the SM cannot start new warps because of it. But what are TRAM Allocation and ISBE Allocation? I cannot find any documentation about them.

Is there any way of seeing how many registers are used by a shader, and how many registers are needed to be removed to remove register allocation limitations?

occupancy

Hello,

Thank you for using Nsight Graphics and I have discussed with the engineering team and have some responses for you below.

> The top one, stalled register allocations as I understand it, is that a shader is using too many registers, so the SM cannot start new warps because of it.

This is correct.

> But what are TRAM Allocation and ISBE Allocation? I cannot find any documentation about them.

TRAM and ISBE are on-chip buffers. TRAM contains pixel shader attributes; ISBE contains “VTG” attributes. VTG = vertex, tessellation, geometry.

The “Advanced Learning” document provides a little more information: https://docs.nvidia.com/nsight-graphics/AdvancedLearning/index.html

> Is there any way of seeing how many registers are used by a shader, and how many registers are needed to be removed to remove register allocation limitations?

Currently, GPU Trace cannot directly answer this question. However, other tools directly reveal this:

In existing versions of Nsight Graphics, the Frame Profiler’s Linked Programs view shows PSO-level values: https://docs.nvidia.com/nsight-graphics/UserGuide/index.html#linked_programs_view_0

Starting in version 2022.4, the Shader Profiler’s summary page will show detailed occupancy calculations: https://docs.nvidia.com/nsight-graphics/UserGuide/index.html#uireference_shaderprofiler

The architectural warp-occupancy-per-register-count can be explored in the Nsight Compute Occupancy Calculator: https://docs.nvidia.com/nsight-compute/NsightCompute/index.html#occupancy-calculator

It can also be directly calculated in a spreadsheet with these formulas:

uint32_t regRowsPerSubp = 512; // note: regRowsPerSubp * subpPerSm * 32 threads/warp = 65536 reg/SM

uint32_t maxWarpCountLimitedByRegCount = min(warpCountPerSm, 4 * floor(regRowsPerSubp / roundUpToMultipleOf8(regCount)))

Finally, “Please also try the trace analysis tool. This view provides, in addition to performance insights and recommendations, important information on the traced data”

1 Like

Thanks for a good answer

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.