But I didn’t notice any differences with __sad(x,y,z)(Sum of Absolute Difference). So is it useful to replace additions with it? Maybe even in loops?
for (int i=10; i>0; i–)=> for (int i=10; i>0; __sad(i,1,0))
I am not a mathematician. So I don’t see much use for the other Integer Functions(Maybe in counters?). Or is there a way to integrate __clz, __ffs, __popc, etc., easily in order to speed up code?
replacing the addition by sad you do not gain anything in terms of speed
but in general sad(x,y,z) seems to be a very useful instruction (if one can find a good application for it) because it performs 3 arithmetic operations at once…
The programming guide does not say anything about its speed but according to my tests looks like it’s executed in 4 clock cycles per warp (special hardware ?)
Nvidia should have published these information. Because a lot of things are missing in their guides. And sometimes it is very badly explained(I am about to print the timings.txt!) But I also have to admit, that I was too lazy to write something to find it out myself :rolleyes: . So thanks!
I had a look at the ptx. And it seems that sad(x, 0,y) results in at least two operations. Because a “0” has to be created first. Therefore an addition of two numbers, variables, etc. is likely to be slower with sad than with a normal addition. However if you have 3 elements, etc., it seems to be faster. But one should not forget that __sad(a,b,c) is not a+b+c, but |a-b|+c.