Help with 'double precision'

Sarnath · July 29, 2008, 3:58am

Guys,

I have this piece of code written for single-precision. BUt I find that when I increase a few parameters, the CPU and GPU output varies a lot.
For example: the CPU output is around 63.2232 and GPU output is around 61.8922

This is quite a lot, I guess. I have checked for errors in my program! But I have not found anything much!

–edit–
Later this happened to be a bug in my program.
–edit–

So, I believe this is a kind of precision issue!

I would like to compile the code for double precision and test it on a double-precision hardware!

I understand that I have to change all “floats” to “double” in my code!
I have to give some compiler options to NVCC asking it to compile for 1.3 architecture (or compute capability)
I run CUDA 1.1. Should I upgrade to CUDA 2.0 Beta to even compile things for double precision?

I am expecting some help from one of you guys in executing this piece of code! I will post the executable when I get my compile working for double-precision!

Thanks for any inputs guys!

Best Regards,
Sarnaht

Sarnath · July 29, 2008, 5:46am

A few sample outputs r below. The TimeSteps is an important factor in the algorithm. Out of 1000 options priced, the following options exhibited a difference of atleast “0.2” between CPU and GPU outputs… Is this normal?

VERIFYING…

I 1, GPU-69.567924, CPU-69.207245, TimeSteps 3932

I 5, GPU-70.370872, CPU-70.166031, TimeSteps 3175

I 17, GPU-100.695786, CPU-100.403946, TimeSteps 3754

I 32, GPU-45.507301, CPU-45.246864, TimeSteps 2532

I 34, GPU-95.429131, CPU-95.157974, TimeSteps 3306

I 37, GPU-38.941895, CPU-38.657276, TimeSteps 2381

I 40, GPU-36.107521, CPU-35.873089, TimeSteps 3271

I 41, GPU-87.013672, CPU-86.790955, TimeSteps 3428

I 45, GPU-60.079376, CPU-59.825439, TimeSteps 3880

I 48, GPU-55.081902, CPU-54.838657, TimeSteps 3014

I 55, GPU-36.358498, CPU-36.126358, TimeSteps 2588

I 69, GPU-40.159321, CPU-39.825081, TimeSteps 3812

I 74, GPU-80.649918, CPU-80.383568, TimeSteps 2521

I 82, GPU-116.398773, CPU-116.193512, TimeSteps 3428

I 83, GPU-121.453621, CPU-121.238098, TimeSteps 2608

I 87, GPU-109.446098, CPU-109.209854, TimeSteps 1804

I 89, GPU-89.093376, CPU-88.887985, TimeSteps 3667

I 91, GPU-80.210693, CPU-79.977104, TimeSteps 3400

I 106, GPU-116.037453, CPU-115.761879, TimeSteps 2592

I 110, GPU-93.717445, CPU-93.494553, TimeSteps 2646

I 122, GPU-141.651199, CPU-141.378708, TimeSteps 3846

I 123, GPU-97.167770, CPU-96.926987, TimeSteps 3622

I 125, GPU-98.597450, CPU-98.382256, TimeSteps 2868

I 131, GPU-51.975296, CPU-51.659637, TimeSteps 2556

I 139, GPU-59.711536, CPU-59.505047, TimeSteps 2668

I 142, GPU-52.565277, CPU-52.338570, TimeSteps 3250

I 145, GPU-117.427628, CPU-117.078667, TimeSteps 3575

I 152, GPU-39.411385, CPU-39.157616, TimeSteps 3844

I 154, GPU-86.696617, CPU-86.426834, TimeSteps 2778

I 179, GPU-52.247574, CPU-51.867119, TimeSteps 3607

I 188, GPU-64.756355, CPU-64.540169, TimeSteps 3491

I 191, GPU-90.901154, CPU-90.655647, TimeSteps 2366

I 195, GPU-45.863991, CPU-45.595406, TimeSteps 3905

I 199, GPU-129.630981, CPU-129.403290, TimeSteps 2989

I 201, GPU-39.990051, CPU-39.759598, TimeSteps 2220

I 213, GPU-121.069366, CPU-120.747643, TimeSteps 3187

I 217, GPU-145.514282, CPU-145.265671, TimeSteps 2319

I 226, GPU-121.641212, CPU-121.309372, TimeSteps 3561

I 227, GPU-108.964218, CPU-108.725372, TimeSteps 2130

I 238, GPU-74.857147, CPU-74.558403, TimeSteps 3700

I 241, GPU-44.635242, CPU-44.315754, TimeSteps 3223

I 256, GPU-24.761784, CPU-24.560993, TimeSteps 2586

I 257, GPU-95.928993, CPU-95.680717, TimeSteps 2700

I 263, GPU-90.916061, CPU-90.687653, TimeSteps 3405

I 264, GPU-49.595531, CPU-49.360012, TimeSteps 2248

I 271, GPU-81.785515, CPU-81.582695, TimeSteps 3140

I 284, GPU-120.612991, CPU-120.307678, TimeSteps 3545

I 286, GPU-65.342842, CPU-65.119621, TimeSteps 1582

I 289, GPU-112.867081, CPU-112.544098, TimeSteps 3846

I 299, GPU-62.975578, CPU-62.628876, TimeSteps 2573

TS Average: 2325.226667

Sarnath · July 29, 2008, 7:55am

This just happens to be a bug in my program! I am yet to find out the root cause!

But I just ran some experiments (2 ways of doing same thing yields 2 different results… which is clearly a bug) and found this out!!

But still, Appreciate if some1 answers on enabling double-precision in compilation and issues invovled in porting single precision code to double precision…

E.D_Riedijk · July 29, 2008, 11:16am

you really need to have cuda 2.0 beta, earlier versions do not support the GT200. Otherwise your steps seemed ok.

Sarnath · July 29, 2008, 11:39am

Thanks! btw, I fixed the bug in the code! THe errors are now of the order 7E-2 max… For normal cases it is around 1E-2!

I ran the L1 Normal error checking as done in NVIDIA Binomial Sample - I found it to be within limits!!

So far so good!

Thats one curse of working in GPU - You dont know if dats a precision error or logical bug :-( Sometimes it is a boon… You always have some1 to blame for your mistakes :-)

MisterAnderson42 · July 29, 2008, 12:36pm

Of course, it might be neither a precision “problem” on the GPU nor a bug. It could just be there is some chaos in the math behind your algorithm. Iterative algorithms that base the calculation of the next state on the previous state and then iterate thousands or millions of times can diverge HUGELY from the same calculation performed where one value was just evaluated just 0.0000000000000000000001 different. With floating point numbers, simply the difference between calculating a+b or b+a can make this kind of difference.

Not all iterative calculations have chaotic properties, though so this doesn’t always apply.

Just some food for thought.

Sarnath · July 29, 2008, 12:42pm

What you said captures the essence of this financial algorithm correctly!!

Its pretty much what is being done in this algorithm!

You calculate the stock option prices at time “T” in future and then “back-calculate” step by step and find out what is the true price of the stock option today!!!

Floating points and their non-determinism always piss me off… Sometimes, I hear 1000 could be presented as 999.999999 … And a+b and B+a stuff etc… Stuff like this add to the confusion…

One more thing to shift the blame on… :-)

kristleifur · July 29, 2008, 1:07pm

It looks like you’ve found out what you needed to find out, but I’d like to add, just in case:

Have you seen this stuff in the common makefile?:

NVCCFLAGS += $(SMVERSIONFLAGS) -arch sm_11

I guess you’ll need -arch sm_13 or something for double precision.

Sarnath · July 29, 2008, 1:14pm

Oh Sure, Thanks! That would be hepful too.

For the benefit of all –

There are 2 stages in which the code is compiled. First the code is compiled to PTX and stored in the object file. When the kernel is launched, a run-time translation happens again!!!

NVCC has options to control both!!

Here are they:

–gpu-name (-arch)

Specify the name of the nVidia GPU to compile for. This can either be a ‘real’

GPU, or a ‘virtual’ ptx architecture. Ptx code represents an intermediate

format that can still be further compiled and optimized for. depending on

the ptx version, a specific class of actual GPUs.

The architecture specified with this option is the architecture that is assumed

by the compilation chain up to the ptx stage, while the architecture(s) specified

with the -code option are assumed by the last, potentially runtime compilation

stage.

Allowed values for this option: ‘compute_10’,‘compute_11’,‘sm_10’,‘sm_11’.

Default value: ‘sm_10’.

–gpu-code ,… (-code)

Specify the name of nVidia gpu to generate code for.

Unless option -export-dir is specified (see below), nvcc will embed a compiled

code image in the executable for each specified ‘code’ architecture, which

is a true binary load image for each ‘real’ architecture (such as a sm_13),

and ptx code for each virtual architecture (such as compute_10). During runtime,

such embedded ptx code will be dynamically compiled by the cuda runtime system

if no binary load image is found for the ‘current’ GPU, and provided that

the ptx level is compatible with this current GPU.

Architectures specified for options -arch and -code may be virtual as well

as real, but the ‘code’ architectures must be compatible with the ‘arch’

architecture.

For instance, ‘arch’=compute_13 is not compatible with ‘code’=sm_10, because

the earlier compilation stages will assume the availability of compute_13

features that are not present on sm_10.

This option defaults to the value of option ‘-arch’.

Allowed values for this option: ‘compute_10’,‘compute_11’,‘sm_10’,‘sm_11’.

Topic		Replies	Views
float / double issue CUDA Programming and Performance	12	22260	December 31, 2010
Problem with running code with double precision values Double precision gives wrong result CUDA Programming and Performance	2	1251	August 28, 2009
How to activate double-precision computation CUDA Programming and Performance	4	30443	September 14, 2009
Double Precision Help... Double precision CUDA Programming and Performance	6	5202	September 1, 2011
Is there a difference between GPU double precision and CPU double precision? CUDA Programming and Performance	14	11270	November 26, 2009
Using double precision in CUDA how to turn on double precision in CUDA CUDA Programming and Performance	2	3089	July 27, 2008
support double precision in cuda CUDA Programming and Performance	12	1987	January 26, 2014
double precision differences differnces in precision of values when compared to matlab CUDA Programming and Performance	4	968	March 2, 2012
Issues with double precision support on GT200 CUDA Programming and Performance	7	2819	July 7, 2008
worked fine for "int" "float" but NOT "double" CUDA Programming and Performance	13	5156	March 9, 2009

Help with 'double precision'

Related topics