I found that it seems the floating number representation in CUDA doesn’t follow the IEEE standard, does any one know how CUDA (or maybe say GPU) represents the floating number? like how many bits for exponent and how many bits for fraction? Thanks very much!

Check out appendix A.2 in the programming guide, it’s IEEE-754 with a few deviations.

There are a few deviations in the mathematical operations from IEEE-754 (a few ulp less precision here and there), but the actual number **representation** is the same. You can memcpy float’s from the host <–> device without any issues whatsoever.

Tthe representation is the same, except for the fact that denormalized numbers are interpreted as 0, AFAIK.

I read on some computer hardware site that GT200 has support for denormals in hardware in a very fast/efficient way. Now I don’t know if it is true, but from reading it they were very impressed.

I also remember reading that all rounding modes aren’t supported with all ops. There are several other slight deviations, if you go deeper into that section of the programming guide. Even if it was IEEE754 compliant however, results wouldn’t always match with a CPU’s, because of the intermediate representations used by the latter.

Yes, I think I am hitting such issues in my financial algorithm code… Therez always a small deviation in result in my trinomial European put code.

For example:

50.042450 - GPU result

50.032833 - CPU result

I even tested very small kernels which cant have any kind of race conditions and I still found that there were some deviations with complex math.