Memory workload analysis

asandip785 · January 7, 2022, 12:46am

I a using NSIGHT Compute to optimize the performance of my CUDA code. In the report, the message below was identified as one of the areas for performance improvement.

“The kernel is utilizing greater than 80.0% of the available compute or memory performance of the device. To further improve performance, work will likely need to be shifted from the most utilized to another unit. Start by analyzing workloads in the Memory Workload Analysis section.”

Here’s the memory workload analysis:

I am attempting to infer the results. Is the report suggesting that I move some of the data to local or shared memory to help improve performance?

Robert_Crovella · January 7, 2022, 3:42pm

That bright orange line that represents the read path from device memory is the point of focus. The chart is color coded, and you can see that bright orange indicates a measurement in the ~80% of peak theoretical range.

So your code is memory bound. This is not surprising. If your algorithm is memory bound (e.g. vector add) there is likely not much you can do about it.

But there are various suggestions. The two most common that come to mind are:

make sure your global loads are coalesced
take advantage of all caches in the architecture, as well as shared memory, to cache reused data

Topic		Replies	Views
memory bound CUDA Programming and Performance	3	1189	April 10, 2013
Is there any tool which can tell my kernel is compute bound or memory bound CUDA Programming and Performance	7	6008	April 3, 2010
DRAM Excessive Read Sectors CUDA Programming and Performance	2	450	February 8, 2024
Could you suggest some ideas to improve my kernel's performance? CUDA Programming and Performance	3	39	September 23, 2024
Kernel bound by instruction and memory latency. CUDA Programming and Performance	3	1923	November 24, 2017
Is optimization possible for this kernel? Nsight Compute cuda	0	1347	May 8, 2024
Confusion about NSight Compute profiler results Nsight Compute cuda , kernel , nvbugs	1	521	June 5, 2020
What's a reasonable memory bandwidth performance to expect? My current maximum is only around 50 CUDA Programming and Performance	1	634	July 27, 2010
How to tell if a kernel is memory or compute bound CUDA Programming and Performance	8	9342	February 4, 2010
Question about Memory Workload Analysis (keyword) Nsight Compute	4	678	October 12, 2021

Memory workload analysis

Related topics