I’m running some double precision code in device emulation mode for development.
I’m getting some pretty mean error on division. Is this representative of the real device or just a weirdness in deviceemu? The CUDA manual says I should have pretty much none.
a = 3.5637338457619299
b = 9.3451182258927847
deviceemu a/b = 0.38134709000587463
gdb (on cpu) a/b = 0.38134711189504172
You can debug binaries built with -deviceemu like you would any other GDB application.
So something like (comments in parenthesis):
break gpu_function (break at entry to gpu function)
r (run)
list (multiple times perhaps, to see interesting code)
b 123 (break at interesting line of current file)
c (continue)
p variable (print variable values)
I have the same problem with double precision. I compile with sm_13 and I have a GTX280. On device there are erroneous results but if emulated there are no errors. There’s an example of my output. First row is index of matrix, second one is GPU, third one is CPU.
The deviation in your results are down at about the 16th digit. IEEE double precision uses 53bits for the fractional component of the number. That means you have roughly log10(2^53) or 15.955 digits of fractional accuracy before you would expect to see deviations between results computed on different IEEE-754 compliant machines.
I would put your results in the “perfectly acceptable” category. The original posters’ results, however, are clearly something different.