Double-single division

Hello all,

I was wondering if anyone had successfully implemented emulated double precision division on a CUDA device. I tried to follow the example set by dsfun90 but my device results still fail to match the results reached by full double precision. I have also tested the algorithm on the CPU using both device emulation and straight cpu computation while using -ffloat-store. When I do this, they answers turn out exactly correct. Any thoughts are welcome.

Cheers,
-Matt

I’ve only implemented the dsfun functions for addition and multiplication in CUDA.

Looking at the source for dsdivs, you might be running into problems with the lines:

t1 = dsa(1) / db

and

t2 = (t11 + t21) / db

According to Appendix B of the CUDA programming guide, division on the GPU only has an accuracy of 2 ulps, whereas addition and multiplication should match the IEEE-754 answer exactly. Unfortunately, I don’t know how best to rewrite the algorithm to fix this…

Thanks, that is what I was leaning towards. I looked through the original paper that the dsfun algorithms came from. The number 8193 is supposed to be 2^(t-t/2)+1 where t is the number of binary digits. When I lowered the t value one, I did get improvements in some calculations but higher error in others. So I suppose I will attempt to prep all my calculations by dividing on the CPU.

Cheers