Timing for division and remainder

Xilman · December 21, 2009, 4:00pm

I have been unable to find the number of clock cycles for performing integer division of 32 and 64 bit quantities. Extensive Googling turns up only the mantra; “Integer division and modulo operation are particularly costly and should be avoided if possible or replaced with bitwise operations whenever possible:”. I’ve tried to make measurements from some sample code running on a Tesla C1060 but with inconclusive results.

Does anyone have the real information?

I ask because I’m about to embark in writing some code for multiple precision integer arithmetic and want to know how to get decent performance without having to write several versions of complicated algorithms to see which is likely to get the best performance. For instance:

q = x / y; r = x%y;

can be written as

q = x / y; r = x - y * q;

where a remainder is traded for a multiplication and a subtraction. Testing which is faster is simple enough in single precision (by which I mean unsigned long or unsigned long long) arithmetic but gets rather more difficult in multiple precision, especially as there appears to be no access to quotient and remainder of double length quantities analogous to __umulhi() and __umul64hi() for double length multiplication.

Thanks in advance,

Paul

jack · December 21, 2009, 5:39pm

You can use the clock() method in your kernels…it’s after the “built-in variables” section of the programming guide.

Topic		Replies	Views
How slow is integer division and modulo? CUDA Programming and Performance	11	11359	September 23, 2008
integer division and modulo CUDA Programming and Performance	14	23256	November 10, 2008
Is it possible to replace integer division by floating-point division for speed CUDA Programming and Performance cuda	9	2738	March 24, 2022
Measurements of different CUDA operator throughputs CUDA Programming and Performance	32	50330	August 24, 2009
Speed comparison of division compared to other arithmetic operations, perhaps something like clock cycles CUDA Programming and Performance	9	6663	November 19, 2024
Faster division on 64-bit unsigned integers CUDA Programming and Performance	4	2238	July 15, 2017
division/modulus optimization CUDA Programming and Performance	0	4529	March 28, 2010
Speed of modulo operator in CUDA CUDA Programming and Performance	5	4562	September 13, 2019
Instruction timings More info than in the guide CUDA Programming and Performance	5	8365	May 21, 2007
error in modulo operation CUDA Programming and Performance	12	16251	September 20, 2009

Timing for division and remainder

Related topics