Double-single division

Hello all,

I was wondering if anyone had successfully implemented emulated double precision division on a CUDA device. I tried to follow the example set by dsfun90 but my device results still fail to match the results reached by full double precision. I have also tested the algorithm on the CPU using both device emulation and straight cpu computation while using -ffloat-store. When I do this, they answers turn out exactly correct. Any thoughts are welcome.


I’ve only implemented the dsfun functions for addition and multiplication in CUDA.

Looking at the source for dsdivs, you might be running into problems with the lines:

t1 = dsa(1) / db


t2 = (t11 + t21) / db

According to Appendix B of the CUDA programming guide, division on the GPU only has an accuracy of 2 ulps, whereas addition and multiplication should match the IEEE-754 answer exactly. Unfortunately, I don’t know how best to rewrite the algorithm to fix this…

Thanks, that is what I was leaning towards. I looked through the original paper that the dsfun algorithms came from. The number 8193 is supposed to be 2^(t-t/2)+1 where t is the number of binary digits. When I lowered the t value one, I did get improvements in some calculations but higher error in others. So I suppose I will attempt to prep all my calculations by dividing on the CPU.