double precision differences differnces in precision of values when compared to matlab

I’m trying to replicate some Matlab code in Cuda in order to increase performance. I am using a Telsa C2050 card on windows 7 64bit. The matlab code does some simple calculations of frequency steps and simulation time and then computes the exponential function of a complex array. The problem i am getting is in the calculation of the adjusted simulation time. I have been able to trac the differences down to one calulation of the offset for the simulation time. This is simply 2 x range / speed_of_light; where range = 4.332077681840050e4, and speed of light = 3E8, both of these values are passed in to the Cuda routine. when i pass out the result of this calculation i get a differnce in the values. Can anyone explain why this occurs and how i can fix it? are there any compiler options that i need to be aware of for higher precision?

Cuda result
Matlab result
hand Calculated


It looks very much like you are actually using floats rather than doubles to compute your figures.
Just for the sake of it, here is what you formula gives me on CPU using either doubles or floats:
double: 0.0002888051787893367 float: 0.0002888051967602223

So since I take from the title of your post that your code is actually using doubles, I guess you simply compiled it without mentioning any targeted architecture… This means that by default, you are compiling for compute capability 1.0, which doesn’t support double precision floating point arithmetic. This means that all your doubles are silently downcast to floats. But since your targeted hardware is C2050, by adding a -arch=sm_20 to your compiler options (well since I’m using Linux, I’m not so sure how to add this on Windows, or even if the compiler option is the same) to have you problem solved (hopefully).


sm_20 doesn’t work, i think its because its a Tesla card, and sm_20 is for Fermi support.

I get this error from matlab wehn i try to execute the matlab code

Error using iCheckPTXEntryAgainstCProto (line 370)

The number of inputs to the PTX code (0) is NOT the same as the number of inputs in the C

prototype (17)

Error in C:\Program


(line 72)

I tryied sm_13, to nvcc doc says this support double precision. Low and behold that fixed the problem. thanks for the lead.

FYI i was compiling in matlab

!nvcc -ptx -arch=sm_13 .cu

DO you mean that the core is Tesla architecture? The sm_13 also supports double.

Tesla C2050 is a Fermi device (don’t get confused by the irritating naming - Tesla is the name of both a GPU architecture and a product line). Compiling with [font=“Courier New”]nvcc -ptx -arch=sm_13 .cu[/font] works even though the architecture is wrong because it generates PTX code, which can be translated for the Fermi architecture at runtime.

If your card really is a Tesla C2050, using sm_20 should work as well (or better…).