NSight 3.0 - How to find out which instructions are double precision?

GeorgT · January 9, 2013, 12:10pm

Hi,

Another thing on the instruction level analysis: It seems to me, that my kernel shouldn´t do too many double precision instructions (by code inspection). However, the “CUDA Achieved FLOPS” experiment gives me roughly the same amount of double instructions as single instructions (close to a billion). I hope my conlcusion is correct, that if I could eliminate these double precision instructions, my overall performance should rise (as the K10 has a single precision performance around 20 times higher then double precision).

Up to now, I used the Source View in the analysis report window (.nvreport) to inspect the ptx code, howwever, this windows doesn´t seem to make a correspondence between ptx (and/or sass) lines to CUDA C code lines …

Questions:

Do you think that the statistics taken from the analyser are correct?
Is my assumption correct, that if I eliminate or substitute the double precision instructions with single precision instructions my performance will rise?
How are you guys inspecting the code? Do you have any hints for me?

Georg

Greg · January 10, 2013, 6:46am

Georg,

To get code correlation between C source code, PTX, and SASS you have to tell the compiler to generate debug information or line information. To enable generation of line information follow these steps:

Open Solution Explorer
Right click on your .cu file or project file
Execute the Properties command
In the left pane select the tree node Configuration Properties | CUDA C/C++ | Device
In the right pane change Generate Line Number Information to Yes

The compiler makes a best attempt to maintain line information but for some optimizations it simply can’t maintain the correlation so the source to SASS information may not be correct.

In order to find the double precision instructions you can go to the SASS view and search for the following functions:

DADD
DMUL
DFMA

The most common reasons for double precision are

Floating point constant specified without size specifier (1.0 is double, 1.0f is float).
Calling double precision math operations (sin() vs. sinf())
promotion of float to double for vargs (printf(“%f”, 1.0f) requires upconvert of 1.0f to double)

Removing use of double on a CC 3.0 architecture can improve instruction throughput and reduce register pressure.

In your specific case running the Instruction Count experiment on a kernel built with Debug configuration may provide you the quickest means for mapping the SASS instruction to the C source line.

GeorgT · January 10, 2013, 10:28am

Thanks for the suggestions. Especially looking for DADD, DMUL, DFMA pointed me to instructions where I forgot to indicate that a literal is float instead of double. These instructions were in the core of my kernel …

So now I eliminated all double precision instructions and increased my SP perfomance to 238 GFLOPs on one GK104 chip. Before I had around 30 GFLOPs for SP and DP.

Topic		Replies	Views
NSight 3.0 - CUDA Instruction Count Nsight Visual Studio Edition	3	1741	January 16, 2013
Compile float as 64bit floating point CUDA Programming and Performance	7	1442	September 25, 2016
Are double precision functions in CUDA MATH API only the copy-paste version of single precision func CUDA Programming and Performance	4	1952	June 28, 2014
Double Precision Help... Double precision CUDA Programming and Performance	6	5022	September 1, 2011
Hidden double: Search and destroy ptx file: Double is not supported. Demoting to float CUDA Programming and Performance	4	2638	August 30, 2011
Count instructions of compilation Nsight Visual Studio Edition	1	840	May 7, 2013
Double precision Accuracy with sqrt, log math functions Results on CPU & GPU are not exactly sam CUDA Programming and Performance	9	5396	April 12, 2012
Branch Divergence CUDA Programming and Performance	3	5928	October 7, 2015
"no instruction" stalls every 256 bytes of the binary code CUDA Programming and Performance	7	1507	February 14, 2019
"Instruction Fetch" in Nsight Performance Analysis CUDA Programming and Performance	8	2489	January 7, 2016

NSight 3.0 - How to find out which instructions are double precision?

Related topics