I have had some issues with the lack of precision when using floats so I decided to try it out with doubles instead. I have another system coded in java in which I confirm that cuda calculated “correct”. I only verify that on a fraction of the material that goes throught the gpu. The java program and cuda usually comes to the same result. However because the java program uses doubles and the cudaprogram uses floats sometimes there can be differences due to precision issues. ( for other reasons I need to use doubles on the java-part)
So I ran a test with floats as usual and printed the output (which is like 500000 rows so quite much ) then I upgraded to doubles in the cudacode and ran the exact same test. To my surprise I got the exact same result with doubles as with floats. This led me to believe that I have some issue with the compilation. I’m quite sure that doubles can be converted to floats even if you have written double in the code if you compile with some compliler-flag or similar. My compilation lines of interest looks like this:
BINNAME=CudaCallC NVCCFLAGS=--compiler-options -fPIC CUDADIR=/opt/cuda/sdk/C/common/inc JAVADIR=$(shell readlink --canonicalize $(JAVA_HOME)) INCLUDEDIRS=-I$(CUDADIR) -I$(JAVADIR)/include -I$(JAVADIR)/include/linux LINKPARAMS=-lxerces-c -lmysqlpp -L/opt/cuda/lib/ -lcudart -lprotobuf COMPILEPARAMS=$(INCLUDEDIRS) $(LINKPARAMS) nvcc -shared $(NVCCFLAGS) regression.cu $(COMPILEPARAMS) -o lib$(BINNAME).so -arch=sm_13 2>&1 | grep -v "assuming global memory space"
I have different cards available here if that matters. The card I tested it on now was a GTX 260. But I also got a GTX480. Is it better to compile with some other settings on the GTX480 in general by the way?
Im running linux and on the machine with the GTX260 I have CUDA Driver and Runtime version 3.0 and nvidia-drivers version 195.36.31 if that is important as well.
Anyone have any clue what might be the problem here?
Any input would be appreciated.