Double precision

I’m running some double precision code in device emulation mode for development.

I’m getting some pretty mean error on division. Is this representative of the real device or just a weirdness in deviceemu? The CUDA manual says I should have pretty much none.

a = 3.5637338457619299
b = 9.3451182258927847

deviceemu a/b = 0.38134709000587463
gdb (on cpu) a/b = 0.38134711189504172

Ben

That emulation results looks suspiciously like single precision. Makes me wonder whether device emulation actually supports double precision…

And was the code compiled with -arch=sm_13?

YDD: No --arch=sm1_3. Will try that.

I need to get this crap in an SVN repository… Currently in the middle of the modification so I can’t test it just now :).

avidday: I was wondering the same thing, but all the multiplies come out fine :/.

Ben

May I ask a stupid question?
How do you debug the value? Through printf?

You can debug binaries built with -deviceemu like you would any other GDB application.

So something like (comments in parenthesis):

break gpu_function (break at entry to gpu function)
r (run)
list (multiple times perhaps, to see interesting code)
b 123 (break at interesting line of current file)
c (continue)
p variable (print variable values)

Something like that.

Ben

I have the same problem with double precision. I compile with sm_13 and I have a GTX280. On device there are erroneous results but if emulated there are no errors. There’s an example of my output. First row is index of matrix, second one is GPU, third one is CPU.

[codebox]117 0.89269727841703583326449233936727978289127349853516 0.89269727841703572224218987685162574052810668945312

752 -0.00003725353962411026564893745671724900603294372559 -0.00003725353962424904352701560128480195999145507812

753 -0.52554355634900895566374856571201235055923461914062 -0.52554355634900873361914364068070426583290100097656[/codebox]

I don’t think you do.

The deviation in your results are down at about the 16th digit. IEEE double precision uses 53bits for the fractional component of the number. That means you have roughly log10(2^53) or 15.955 digits of fractional accuracy before you would expect to see deviations between results computed on different IEEE-754 compliant machines.

I would put your results in the “perfectly acceptable” category. The original posters’ results, however, are clearly something different.

Thanks for the answer, I wasn’t quite sure when it can be considered as acceptable.

-arch=sm_13 fixes the problem! The division is working properly now.

Thanks for the comments everyone!

Ben