nv nsight report card suggestions

srikanthcs05 · July 15, 2019, 5:45am

Hello,

I have written a GPU code for my meshfree CFD solver using CUDA Fortran. I am attaching a report card that I obtained from the nv nsight compute application for kernel performance analysis. I use a Quadro M5000 for my computations. I am showing the performance of one of my kernels which takes up 97 percent of the computations. GPU experts here, kindly help with where exactly I could improve my kernel in maximizing the SM resources. I get most of the data shown in the report, but I really need a direction where I can start improving my kernel.

Regards,

Srikanth

MatColgrove · July 15, 2019, 6:50pm

Hi Srikanth,

Looking at the report card, the biggest thing that jumps out is the high register usage (255 per thread) with is causing a low occupancy (12.5%).

Not knowing your code, I can’t offer any specific advice, but high register usage is often due to the code having many local variables, intermediary computation, and/or many address computation. If you have one very large kernel, try splitting it up into multiple smaller kernels. It may mean using more memory to store intermediary values, but hopefully will reduce the register count (ideally less than 64 registers per thread) and give you better occupancy on the device.

-Mat

srikanthcs05 · July 16, 2019, 4:14am

Thanks for the reply Mat. I have been thinking about it from that angle. I am trying optimize the code that way, the kernel is too damn big.

Thanks for the suggestion.

Srikanth

Topic		Replies	Views
CUDA Fortran optimization strategies nvc, nvc++ and nvfortran	2	309	June 5, 2024
too large kernel solutions CUDA Programming and Performance	11	4397	September 2, 2008
Occupancy/ Optimazation How to use Occupancy Calculator, improve performance CUDA Programming and Performance	12	17005	December 7, 2011
Register demand CUDA Programming and Performance	2	2765	September 9, 2009
Analysing the registers CUDA Programming and Performance	9	1276	March 13, 2012
Multiproccesor occupancy for a flop intensive kernel .. how bad is 25% ? CUDA Programming and Performance	10	1857	June 30, 2009
Reducing the number of registers To improve occupancy CUDA Programming and Performance	5	4816	April 5, 2007
how to reduce the number of registers CUDA Programming and Performance	5	9019	July 8, 2010
Problem with reducing registers CUDA Programming and Performance	6	693	June 22, 2011
Registry per thread material CUDA Programming and Performance	4	959	November 19, 2012

nv nsight report card suggestions

Related topics