Why is modulus so slow?

clamport · May 20, 2010, 4:22pm

From all that I have read, I know that modulus in GPGPU code is slow, but why? I assume that it has to do with floating points not liking modulus, but I am kind of interested now.
Thanks!
Chris

Simon_Green · May 20, 2010, 6:04pm

Integer division and modulo are relatively slow because there is no direct hardware support (they compile to multiple instruction sequences). Floating point modulo is fast.

SPWorley · May 20, 2010, 11:49pm

Integer modulo is also slow on CPUs for the same reason.

NCC-1701D · May 21, 2010, 4:07am

you can get a improvement by replacing the modulo op by the actual formula.

a%b == a - (b*(int)(a/b))

tera · May 21, 2010, 7:38am

Why would that be faster?

SPWorley · May 21, 2010, 8:33am

That’s likely how it’s already implemented in the microcode… actual operator timings show divide at 10 clocks, mul at 4 clocks, add at 1 clock, and mod at 17 clocks… which adds reasonably closely.

You could use that benchmark program to test the explicit version yourself. It’s likely identical in speed to the builtin %.

jack · May 21, 2010, 8:36am

Depending on what your divisor is, there are also bit-manipulation tricks to do modulus division that might be significantly faster.

haridy · May 21, 2010, 11:24am

use this equivalence
A%B->A&(B-1)

now to be honest im not sure if thats applicable only if B is a power of 2 or is it a general rule but give it a try might help you

tera · May 21, 2010, 11:39am

If you 're not sure, you might just check yourself:

1%3 = 1

1&2 = 0

haridy · May 21, 2010, 2:41pm

fine,B has to be a power of 2 for the trick to work.

Topic		Replies	Views
Integer modulo CUDA Programming and Performance	2	7286	February 25, 2011
Speed of modulo operator in CUDA CUDA Programming and Performance	5	4235	September 13, 2019
How slow is integer division and modulo? CUDA Programming and Performance	11	11010	September 23, 2008
Is float computation really so slow? CUDA Programming and Performance	3	766	November 25, 2014
I have a question about Cuda CUDA Programming and Performance	1	367	October 15, 2019
why shift is slower than integer multiply shift ,integer multiply CUDA Programming and Performance	20	5746	July 1, 2010
Seemingly insignificant changes result in a 100x kernel slowdown CUDA Programming and Performance	2	562	February 14, 2020
unexpected slow performance CUDA Programming and Performance	0	369	February 29, 2020
performance of integer vs float CUDA Programming and Performance	10	21586	June 15, 2009
is modulus only slow in CUDA? seems to be quick for texture wrapping CUDA Programming and Performance	3	4266	March 22, 2007

Why is modulus so slow?

Related topics