Same SOL for memory and SM Throughput

Hello,

I have a kernel that makes intensive usage of memory in a “gather” manner, that is, each thread performs one or multiple reads to global memory, but the memory addresses are not necessarily coalesced between the threads. Usually, the memory addresses read across threads have a certain locality (i.e. they are not very sparse).

When profiling with ncu, I get exactly the same SOL for compute and memory.

When I look at the throughput breakdown, I see that the memory is limited by “L1: Lsuin Requests”, and the Compute is limited by “SM: Inst Executed Pipe Lsu”.

My understanding is that, when an SM executes a global memory instruction for a warp, a request containing the information of all participant threads of the warp is sent to the L1, then, the L1 has multiple pipelined processing stages.

I assume that the profiler is counting the throughput of these requests both as Compute and Memory. Is that assumption correct? Therefore, would this kernel be considered memory or compute bound?

To improve the performance of the kernel, should I try to reduce the number of requests sent to the memory subsystem?

Thank you

For GV100+ using the unified L1 the

  • Compute Throughput metric SM:Inst Executed Pipe LSU (%), and
  • Memory Throughput metric L1:Lsuin Requests (%)

have the same rate.

In this case I would go to the next value down in the list for Compute Throughput and Memory Throughput which are

  • SM: Issue Active = 52%
  • L1: Data Pipe Lsu Wavefronts = 58%

I would interpret this as latency bound.

The SM is only using ~50% of the issue cycles so it is possible to issue more math or more loads.
The L1 memory system can accept more local/global/shared requests. Either shared memory is currently being used or there are hits is L1 as < 1/3 of the L2 throughput is being used.

I would recommend looking in the Source View page for the areas with highest stall reason which I suspect is long scoreboard.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.