A doubt... 64- 32- and 24-bit math...

Hi to all,

I have a little silly question concerning precision.

Assuming I have a device with c.c. 1.3.
When I do a calculation on GPU, using double functions (i.e. sin()) and both double and float values, I’m using 64-bit math.
If I use function like sinf() and both double and float values I’m using 32-bit math…
If I compile my software using --use_fast_math, or I use functions like __sinf() I use 24-bit math…
Is this all right?

Are there significant differences in therms of performances (I’m not considering memory bandwith) just switching from float to double values?

Thanks to all,


I don’t know if it is accurate to call __sinf() “24-bit math”. Rather, __sinf() is just a less precise sinf() function implemented as a hardware instruction in the Special Function Units on the multiprocessor. In particular, it lacks the “range reduction” step of most sin() calculations, so its accuracy falls off as your argument moves outside the range of [-pi, pi]. The normal sin() and sinf() functions take many, many instructions to run to achieve the 2 ulp accuracy quoted in the CUDA Programming Guide.

Yes, there is a large performance difference. A multiprocessor has 8 stream processors, capable of single precision arithmetic, but only one double precision unit. As a result, the throughput ratio is at least 8 times greater for single precision arithmetic, possibly more since the special function units can also do single precision multiplies in parallel to the stream processors.

Thank you very much seibert…

I agree whit you… What I mean is:

if I do something like this (just an example):

float x;

float y;

y = sinf(x)

I’m using all stream processors. But what about:

double x;

double y;

y = sinf(x)

Just to keep more decimals… Am I still using all stream processors? I’m not using sin() function (here my doubt).

Another silly question… If my software grant a little occupancy, let’s say 12.5% for example - that is a sm for every mp… Is in principle the same use a single stream processor and the double precision unit?

I don’t know if my reasoning is sufficiently clear… :-)

All my troubles comes from the fact that I’m dealing with really huge numbers… Sadly you can’t sum small numbers (like 1 for example) to a float variable forever because of the few decimal digits…

Thank you,


To execute this, I assume that the compiler has to issue as double-to-single cast instruction to x before the sinf() call and a single-to-double cast instruction to the result of the sinf() call. I’m not sure where on the device that is executed (single or double precision units…) though. It probably doesn’t affect your run time because sinf() takes many, many instructions. You won’t get any more precision than your float x, float y example, though.

Have you looked at Kahan Summation? It’s a nice math trick to preserve more precision when doing long float sums. It requires 4 single precision operations per addition, which is still faster than doing double precision. It doesn’t help for any other operation but addition, but I’ve used it to get back a few digits of precision when summing up millions of terms.

Really thank seibert!

Your full of useful informations… :-)

I will have a look, thanks…

What about something like a double summation of sin() functions? I mean, something like:

f += sum(from i==0 to i==N)sum(from i==0 to i==N)[sin(something)]

Here is the same if “f” is double or float?

I’m dealing with numbers in the order of 10^15 or more…

Thank you!!!