float / double issue

Mange · December 29, 2010, 3:13pm

I have had some issues with the lack of precision when using floats so I decided to try it out with doubles instead. I have another system coded in java in which I confirm that cuda calculated “correct”. I only verify that on a fraction of the material that goes throught the gpu. The java program and cuda usually comes to the same result. However because the java program uses doubles and the cudaprogram uses floats sometimes there can be differences due to precision issues. ( for other reasons I need to use doubles on the java-part)

So I ran a test with floats as usual and printed the output (which is like 500000 rows so quite much ) then I upgraded to doubles in the cudacode and ran the exact same test. To my surprise I got the exact same result with doubles as with floats. This led me to believe that I have some issue with the compilation. I’m quite sure that doubles can be converted to floats even if you have written double in the code if you compile with some compliler-flag or similar. My compilation lines of interest looks like this:

BINNAME=CudaCallC

NVCCFLAGS=--compiler-options -fPIC

CUDADIR=/opt/cuda/sdk/C/common/inc

JAVADIR=$(shell readlink --canonicalize $(JAVA_HOME))

INCLUDEDIRS=-I$(CUDADIR) -I$(JAVADIR)/include -I$(JAVADIR)/include/linux

LINKPARAMS=-lxerces-c -lmysqlpp -L/opt/cuda/lib/ -lcudart -lprotobuf

COMPILEPARAMS=$(INCLUDEDIRS) $(LINKPARAMS)

nvcc -shared $(NVCCFLAGS) regression.cu $(COMPILEPARAMS) -o lib$(BINNAME).so -arch=sm_13 2>&1 | grep -v "assuming global memory space"

I have different cards available here if that matters. The card I tested it on now was a GTX 260. But I also got a GTX480. Is it better to compile with some other settings on the GTX480 in general by the way?

Im running linux and on the machine with the GTX260 I have CUDA Driver and Runtime version 3.0 and nvidia-drivers version 195.36.31 if that is important as well.

Anyone have any clue what might be the problem here?

Any input would be appreciated.

Lev · December 29, 2010, 3:24pm

I have had some issues with the lack of precision when using floats so I decided to try it out with doubles instead. I have another system coded in java in which I confirm that cuda calculated “correct”. I only verify that on a fraction of the material that goes throught the gpu. The java program and cuda usually comes to the same result. However because the java program uses doubles and the cudaprogram uses floats sometimes there can be differences due to precision issues. ( for other reasons I need to use doubles on the java-part)

So I ran a test with floats as usual and printed the output (which is like 500000 rows so quite much ) then I upgraded to doubles in the cudacode and ran the exact same test. To my surprise I got the exact same result with doubles as with floats. This led me to believe that I have some issue with the compilation. I’m quite sure that doubles can be converted to floats even if you have written double in the code if you compile with some compliler-flag or similar. My compilation lines of interest looks like this:
BINNAME=CudaCallC

NVCCFLAGS=--compiler-options -fPIC

CUDADIR=/opt/cuda/sdk/C/common/inc

JAVADIR=$(shell readlink --canonicalize $(JAVA_HOME))

INCLUDEDIRS=-I$(CUDADIR) -I$(JAVADIR)/include -I$(JAVADIR)/include/linux

LINKPARAMS=-lxerces-c -lmysqlpp -L/opt/cuda/lib/ -lcudart -lprotobuf

COMPILEPARAMS=$(INCLUDEDIRS) $(LINKPARAMS)

nvcc -shared $(NVCCFLAGS) regression.cu $(COMPILEPARAMS) -o lib$(BINNAME).so -arch=sm_13 2>&1 | grep -v "assuming global memory space"
I have different cards available here if that matters. The card I tested it on now was a GTX 260. But I also got a GTX480. Is it better to compile with some other settings on the GTX480 in general by the way?

Im running linux and on the machine with the GTX260 I have CUDA Driver and Runtime version 3.0 and nvidia-drivers version 195.36.31 if that is important as well.

Anyone have any clue what might be the problem here?

Any input would be appreciated.

Is run time the same? Btw, maybe options gencode and code could help, not just arch.

Mange · December 29, 2010, 3:57pm

Runtime diffed a bit but not that much.

With floats: 32m43.603s

With doubles: 33m28.569s

I was using the computer to other things as well during the runs so maybe the difference would be smaller otherwise too. Expected it to be more difference.

Not sure about the gencode-part? Can I just add that as a compile-parameter?

Lev · December 29, 2010, 4:05pm

I am not on linux, but you should use -code option or -gencode, other way doubles are demoted to floats.

Mange · December 29, 2010, 4:46pm

I found some info in a guide called “Fermi_Compatibility_Guide” where it said the following:

Note: the nvcc command-line option â€œ-arch=sm_xxâ€ is a shorthand equivalent 

to the following more explicit â€“gencode command-line options.

â€“gencode=arch=compute_xx,code=sm_xx 

â€“gencode=arch=compute_xx,code=compute_x

So it seems that -arch=sm_13 should be enough, if I dont misunderstand something here. But it seems that is not because I get the exact same output still =( . I assume the name for double in cuda is ‘double’ and not something else like float64 or something fishy?

Lev · December 29, 2010, 4:56pm

I found some info in a guide called “Fermi_Compatibility_Guide” where it said the following:
Note: the nvcc command-line option â€œ-arch=sm_xxâ€ is a shorthand equivalent 

to the following more explicit â€“gencode command-line options.

â€“gencode=arch=compute_xx,code=sm_xx 

â€“gencode=arch=compute_xx,code=compute_x
So it seems that -arch=sm_13 should be enough, if I dont misunderstand something here. But it seems that is not because I get the exact same output still =( . I assume the name for double in cuda is ‘double’ and not something else like float64 or something fishy?

this info is not clear

Mange · December 29, 2010, 5:52pm

Hmm what do you mean by that the information is not clear? Can you elaborate please?

Lev · December 29, 2010, 6:10pm

I read nvcc compiler guide with descriptions of options, and it mentions code option.

Mange · December 30, 2010, 10:16am

I’m trying without success to use other compiler parameters. Could you give a line for me how you would write it with -gencode or -code please, Im stuck here.

Another question, is there some way that I can see what “version” the compiled code has or similar so I can be sure what version that it is that is actually being executed?

Mange · December 30, 2010, 1:39pm

I think I found my problem. Before I when I used floats I wanted to use IEEE-compliant float operations so I used __fdiv_rn for division and similar for +,-,. I forgot to change those when I changed to doubles. And because those are float-operations some conversions to floats may have happened there. It seems to work now when I changed to “ordinary” +,-,,/ =)

I have a followup-question thou, are there similar functions availables for double like __fdiv_rn for floats? Or are the standard operations when working with doubles IEEE-compliant so to speak ( +,-,*,/ ). I remember when I used floats and was not using __fdiv_rn a number divided by itself was not guarenteed to be 1 for example which leads to problem for my application.

njuffa · December 30, 2010, 5:14pm

Basic arithmetic operations (+,-,*,/,sqrt) on double-precision operands always use IEEE round-to-nearest-or-even. There are also device functions that allow to select the rounding mode:

sm_1x, sm_2x: __dadd_r{n,z,u,d}, __dmul_r{n,z,u,d}, __fma_r{n,z,u,d}
sm_2x: __ddiv_r{n,z,u,d}, __drcp_r{n,z,u,d}, __dsqrt_r{n,z,u,s}

These are listed in appendix C.2.2 of the CUDA C Programming Guide.

Mange · December 30, 2010, 11:10pm

Ah ok thanks! I will look closer on them. I guess the __dX_rn for double is what will work best for me as well then ( because I used the __fdiv_rn for floats). Unless the ‘n’ already is the “round-to-nearest-or-even”, I can’t remember right now, getting late here =)

njuffa · December 31, 2010, 2:36am

Correct, the “n” in the rounding mode denotes “nearest” = “to-nearest-or-even”.

Topic		Replies	Views
performance difference for cuda between experiments and the documentation for float/double data type... CUDA Programming and Performance	8	1903	October 28, 2016
error when trying to use half (fp16) CUDA Programming and Performance	16	20036	October 13, 2015
Strange change in behaviour between float and double CUDA Programming and Performance	6	1310	April 1, 2009
Is there a difference between GPU double precision and CPU double precision? CUDA Programming and Performance	14	10751	November 26, 2009
Several questions on cuda (arithmetic, rounding, for loop ad performance) CUDA Programming and Performance	8	3549	April 13, 2023
Converting a kernel from floats and ints to halfs is 6x slower CUDA Programming and Performance cuda	14	1028	October 16, 2023
first install of cuda CUDA Setup and Installation	6	7639	February 12, 2017
Another "results different from emulation and GPU" Data-parallel, reading and writing to g CUDA Programming and Performance	6	3834	May 5, 2009
Double Precision Help... Double precision CUDA Programming and Performance	6	5077	September 1, 2011
Wrong output in double precision CUDA Programming and Performance	20	5913	May 20, 2011

float / double issue

Related topics