Help with 'double precision'


I have this piece of code written for single-precision. BUt I find that when I increase a few parameters, the CPU and GPU output varies a lot.
For example: the CPU output is around 63.2232 and GPU output is around 61.8922

This is quite a lot, I guess. I have checked for errors in my program! But I have not found anything much!

Later this happened to be a bug in my program.

So, I believe this is a kind of precision issue!

I would like to compile the code for double precision and test it on a double-precision hardware!

  1. I understand that I have to change all “floats” to “double” in my code!
  2. I have to give some compiler options to NVCC asking it to compile for 1.3 architecture (or compute capability)
  3. I run CUDA 1.1. Should I upgrade to CUDA 2.0 Beta to even compile things for double precision?

I am expecting some help from one of you guys in executing this piece of code! I will post the executable when I get my compile working for double-precision!

Thanks for any inputs guys!

Best Regards,

A few sample outputs r below. The TimeSteps is an important factor in the algorithm. Out of 1000 options priced, the following options exhibited a difference of atleast “0.2” between CPU and GPU outputs… Is this normal?

This just happens to be a bug in my program! I am yet to find out the root cause!

But I just ran some experiments (2 ways of doing same thing yields 2 different results… which is clearly a bug) and found this out!!

But still, Appreciate if some1 answers on enabling double-precision in compilation and issues invovled in porting single precision code to double precision…

you really need to have cuda 2.0 beta, earlier versions do not support the GT200. Otherwise your steps seemed ok.

Thanks! btw, I fixed the bug in the code! THe errors are now of the order 7E-2 max… For normal cases it is around 1E-2!

I ran the L1 Normal error checking as done in NVIDIA Binomial Sample - I found it to be within limits!!

So far so good!

Thats one curse of working in GPU - You dont know if dats a precision error or logical bug :-( Sometimes it is a boon… You always have some1 to blame for your mistakes :-)

Of course, it might be neither a precision “problem” on the GPU nor a bug. It could just be there is some chaos in the math behind your algorithm. Iterative algorithms that base the calculation of the next state on the previous state and then iterate thousands or millions of times can diverge HUGELY from the same calculation performed where one value was just evaluated just 0.0000000000000000000001 different. With floating point numbers, simply the difference between calculating a+b or b+a can make this kind of difference.

Not all iterative calculations have chaotic properties, though so this doesn’t always apply.

Just some food for thought.

What you said captures the essence of this financial algorithm correctly!!

Its pretty much what is being done in this algorithm!

You calculate the stock option prices at time “T” in future and then “back-calculate” step by step and find out what is the true price of the stock option today!!!

Floating points and their non-determinism always piss me off… Sometimes, I hear 1000 could be presented as 999.999999 … And a+b and B+a stuff etc… Stuff like this add to the confusion…

One more thing to shift the blame on… :-)

It looks like you’ve found out what you needed to find out, but I’d like to add, just in case:

Have you seen this stuff in the common makefile?:


I guess you’ll need -arch sm_13 or something for double precision.

Oh Sure, Thanks! That would be hepful too.

For the benefit of all –

There are 2 stages in which the code is compiled. First the code is compiled to PTX and stored in the object file. When the kernel is launched, a run-time translation happens again!!!

NVCC has options to control both!!

Here are they: