A list of nominal CUDA instructions throughput

Farzad · April 10, 2014, 8:37am

This presentation:

explains that instruction throughput depends on “Nominal instruction throughput”. It later says that arithmetic instruction throughput for an integer add, for example, is 4 cycles/warp. Is there any list of nominal throughputs for most/all of CUDA instructions?

EDIT:
I found

But it doesn’t have modulo operation for example. Also, how about atomic functions?

seibert · April 10, 2014, 11:54am

Table 2 in Section 5.4.1 of the CUDA C Programming Guide (CUDA 6 release candidate) gives the throughput for different categories of instruction on each of the compute capabilities.

njuffa · April 10, 2014, 3:33pm

The reason the modulo operation is not listed in that table is because there is no modulo instruction (or division instruction, for that matter) provided by the hardware.

seibert · April 11, 2014, 3:18am

As for atomic functions, the throughput depends on the access pattern, so it can be quite variable. Best to benchmark it on your hardware with realistic data.

Topic		Replies	Views
Atomic Operations and Clock Cycles CUDA Programming and Performance	1	971	December 5, 2010
Relations between instruction throughput and CUDA compute capability CUDA Programming and Performance cuda	3	804	January 10, 2023
estimate 64bit integer instruction throughput CUDA Programming and Performance	4	826	September 29, 2018
GTX 1080 - Cuda core architecture CUDA Programming and Performance	2	891	July 9, 2019
Throughputs of the 64-bit sine and cosine instructions CUDA Programming and Performance	2	454	January 31, 2022
profiler instruction count CUDA Programming and Performance	0	3815	November 3, 2009
CUDA for Non-programmers? I am an undergrad physics major... CUDA Programming and Performance	6	2118	June 19, 2011
CUDA C++ programming tutorials CUDA Programming and Performance	4	789	February 8, 2019
Mythical Tflops CUDA Programming and Performance	11	1096	January 14, 2019
latency and throughput of MAD operation? CUDA Programming and Performance	0	3322	December 10, 2009

A list of nominal CUDA instructions throughput

Related topics