Hi,
Since roofline analysis is not available for integer based kernels, I was wondering how to extract/comment on the information provided from the Nsight Compute GPU Speed of Light Throughput
section ?
The kernel does random memory accesses to the main memory so it is expected to have low memory throughput, but the question is low DRAM throughput, does it mean the application is using L1 caches to achieve better throughput overall or am I deducing wrong results here based on percentages ?
How could I visually show the L1, L2, and main memory throughputs and is there a way to come up with a custom/basic roofline analysis kind of plot to show if it is compute or bandwidth bounded ?