I a using NSIGHT Compute to optimize the performance of my CUDA code. In the report, the message below was identified as one of the areas for performance improvement.
“The kernel is utilizing greater than 80.0% of the available compute or memory performance of the device. To further improve performance, work will likely need to be shifted from the most utilized to another unit. Start by analyzing workloads in the Memory Workload Analysis section.”
Here’s the memory workload analysis:
I am attempting to infer the results. Is the report suggesting that I move some of the data to local or shared memory to help improve performance?