Does anyone know the latency for bitwise instructions or logical comparisons?

Instruction Latency

Accelerated Computing CUDA CUDA Programming and Performance

Sylvain_Collange March 29, 2009, 11:03am 13

Well, you already answered your own question. ;)

Also try:

a=b*a+a

a=a*a+a

a=a+a

a=a*a

…

For the constant cache it would make sense, since the 32-wide load is split into two halfs, and tho constant cache runs at half the shader clock (according to the documentation).

Shared memory requires arbitration logic and a full crossbar between execution units and SRAM banks, so it will not be surprising if it require one extra (slow) clock cycle.

Topic		Replies	Views
How to understand the "hide latency" CUDA Programming and Performance	13	3276	August 8, 2024
Pipeline Latencies on GPU vs CPU typical CPU pipeline latencies? CUDA Programming and Performance	17	11524	December 7, 2009
Does %clock measure actual GPU cycles, or what? CUDA Programming and Performance	5	1583	July 9, 2019
Questin regarding latency CUDA Programming and Performance	6	4246	August 26, 2010
Parallel Access to GDU Global Memory CUDA Programming and Performance	9	8932	January 24, 2008
What limits the IPC in CUDA? or How to decrease the avg execution dependency cycles? CUDA Programming and Performance	6	7175	March 30, 2013
Latency and low-level performance questions CUDA Programming and Performance	10	4287	June 23, 2015
a deep dive into Instruction-level parallelism CUDA Programming and Performance	17	4968	December 18, 2018
latency and throughput of MAD operation? CUDA Programming and Performance	0	3322	December 10, 2009
Basic question about warps CUDA Programming and Performance	14	6584	June 9, 2009

Instruction Latency

Related topics