Visualisation of Integer based Random Memory Access Kernel

Hi,
Since roofline analysis is not available for integer based kernels, I was wondering how to extract/comment on the information provided from the Nsight Compute GPU Speed of Light Throughput section ?

The kernel does random memory accesses to the main memory so it is expected to have low memory throughput, but the question is low DRAM throughput, does it mean the application is using L1 caches to achieve better throughput overall or am I deducing wrong results here based on percentages ?

How could I visually show the L1, L2, and main memory throughputs and is there a way to come up with a custom/basic roofline analysis kind of plot to show if it is compute or bandwidth bounded ?

As the rule output suggests, the kernel is latency bound. Collect--set full and follow the rule advise to find optimization opportunities.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.