I’d like to ask that through the CUDA programming guide, it said “division is implemented via the reciprocal in a non-standard-compliant way”. However, I really want to implement the division in a IEEE-754 standard way, do you have any suggestions, please? Thanks a lot!

ps: I use GTX280, thus can support double precision floating point.

Although single precision division is not IEEE-754 compliant, double precision division is (see table B-2 in the Programming Guide). Not sure of a faster way to get IEEE-754 compliant single division, though.

Hi, I’d like further to ask that besides the basic arithmetic, from the table I see the ulp of other mathematical functions are not zero. This means such mathematical functions have error compared with the standard implementation on the CPU, am I right? I am not familiar with these knowledge, I appreciate you can give me the answer. Thanks very much!

Dumb question, more about IEEE-754 than CUDA, but surprisingly a quick Google search didn’t answer my question.

IEEE-754 defines the floating point format very precisely. It also defines operations like add, subtract, multiply, divide. All the online references give exact details about the format, but NOT about the operation behavior reqirements. I realize that division must follow several rules about NaN, sign preservation, ± infinity, and denormals.

But my actual question: is the division itself defined EXACTLY? Will any IEEE-754 floating point compute divide two numbers and return the EXACT same result, bit for bit? Or is small error in the division allowed?

I beleive the answer is + - * / and sqrt() must have 0 error, using proper rounding in the last bit, and therefore all IEEE-754 computations will return the exact same answers bit for bit on any hardware. But I haven’t found an exact confirmation of that fact from my googling.

(Indirect evidence this is true: CUDA’s double precision computes are IEEE-754 compliant and have 0 error.)

Not exactly. There is no “standard implementation” of transcendental functions, as they are not covered by IEEE-754. Depending on your compiler/hardware/operating system, you can get different results, although the rounding error is typically small (say 0.501 unit in the last place, 0.5 being the best achievable). CUDA just have a larger error than most CPU implementations, but there is no fundamental difference. A few ulps is still a negligible error in most circumstances.

Yes, definitely. It even define rules for NaN encoding preservation (NaN can have many different encodings), which is an area on which CUDA deviates, in single-precision.

Actually you will never get 0 error compared to the exact result when computing in finite precision ;). What IEEE-754 requires is “correct rounding” according to deterministic (and somewhat arbitrary) rules. The max rounding error is still 0.5 units in the last place in round-to-nearest mode, and slightly less than 1 ulp for other rounding modes (assuming no overflow, etc).