Accelerating HPC Applications with NVIDIA Nsight Compute Roofline Analysis

Originally published at: https://developer.nvidia.com/blog/accelerating-hpc-applications-with-nsight-compute-roofline-analysis/

Writing high-performance software is no simple task. After you have code that can compile and run, a new challenge is introduced when you try and understand how it is performing on the available hardware. Different platforms, whether they are CPUs, GPUs, or something else, will have different hardware limitations like available memory bandwidth and theoretical…

It was great to collaborate with some of the foremost experts on Roofline Analysis and the Nsight Compute engineering team to create this example. If you have any questions or comments, please let us know.

Where is Step 2. in this paper?
There are a few optimization techniques used in the GitLab repository. To demonstrate how all the features in Nsight Compute including the newly added roofline analysis, can complement each other for a comprehensive performance analysis, we discuss only two of the steps, Step 1 and Step 3.