Integer modulo

shawkie · February 24, 2011, 3:57pm

According to the nVidia CUDA Programming Guide integer modulo is apparently very slow. Are there any alternatives? I can’t assume I will always be using a power of 2 so bit masking is not an option but I don’t need full 32-bit precision so floating-point might be if its faster. Or I saw someone mention in another thread that 64-bit integer modulo is fast but I’m not sure how reliable that claim is.

tera · February 24, 2011, 6:03pm

I can’t believe that 64 bit integer modulo is supposed to be fast, but (approximate) floating point reciprocal is indeed done by the special function unit in 16 (compute capability 1.x), 8 (c.c. 2.0) or 4 (c.c. 2.1) cycles/warp. On top of that you need 4 (1.x) or 1 (2.x) cycles for each of int->float conversion, multiplication with the reciprocal, and float->int conversion. Part of that may be overlapped if you have more than one division per thread.

LSChien · February 25, 2011, 7:14am

WE have discussed this issue before, please check The Official NVIDIA Forums | NVIDIA

fixed-point modulo suggested by @Sylvain Collange is a good alternative.

In my experiment, it is 2x faster than traditional approach which uses double precision to implement modulo.

Topic		Replies	Views
Why is modulus so slow? CUDA Programming and Performance	9	4844	May 21, 2010
Speed of modulo operator in CUDA CUDA Programming and Performance	5	4188	September 13, 2019
performance of integer vs float CUDA Programming and Performance	10	21536	June 15, 2009
CUDA integer ops in hardware the skinny on ints in CUDA and hardware CUDA Programming and Performance	3	20125	March 26, 2007
Image processing in floating point? CUDA Programming and Performance	2	8484	January 16, 2008
How much speed of 64bit integer algebra in the latest GPUs? CUDA Programming and Performance	2	2057	April 21, 2014
Integer Arithmetic 32 integer arithmetic performance CUDA Programming and Performance	4	6873	March 7, 2007
why shift is slower than integer multiply shift ,integer multiply CUDA Programming and Performance	20	5732	July 1, 2010
What's the fastest way to do long multiplications CUDA Programming and Performance	1	3054	June 4, 2008
Forward looking GPU integer performance CUDA Programming and Performance	22	21625	March 20, 2017

Integer modulo

Related topics