enable double precision for SDK I can't figure out where in the makefile the -arch sm_13 should

Howdy, all!

I have downloaded, compiled, and run the CUDA SDK. Now, I’d like to run some of the examples under double precision. However, every time I put the -arch sm_13 flag in a place in a makefile where I think it should go, the compiler complains at me. Has anyone out there had success in compiling the SDK with DP. If so, can you please tell me exactly where I should put the -arch sm_13 flag?

Thanks,
Jeremiah

It should directly follow each invocation of nvcc in the Makefile such as:

[codebox]nvcc -arch sm_13 -c -Xcompiler -m64,-pipe -I “…/CudaCommon” -I “src” -I “/home/jason.goffeney/Tools/Linux3rdParty/gdal/include” -I “/home/jason.goffeney/Tools/Linux3rdParty/glew/include” -I “/home/jason.goffeney/Tools/Linux3rdParty/sqlite3/include” -I “/usr/local/cuda/include” …/CudaCommon/kernels/itmFunctions/itm_kernels.cu -o ${OBJECTS_DIR}itm_kernels_cuda.obj[/codebox]

Thanks for your reply; however, the SDK’s Makefiles use a lot of aliasing and split up their flags into different lines of the Makefile. Do you know where the flag should go in the SDK Makefiles, specifically?

Try looking in common.mk in the common directory of the SDK root which is included in each project Makefile. There is a line commented out as NVCCFLAGS += $(SMVERSIONFLAGS). If you uncomment it and in the project Makefile add SMVERSIONFLAGS = -arch sm_13, I think it will work.

That did it. Thanks!

This thread is helping me a lot, as I just wasted a morning trying to compare some float and double codes for a certain function.

Nevertheless, I have to wonder why this is necessary. This is the first time I have encountered a compiler that needs the makefile to be edited to allow double precision arithmetic. I have only wasted 2 hours on this but I suspect others may have wasted a lot more. Is there a good reason for this, and why is dp not enabled by default?

I had just changed float to double in a global typedef and was baffled until I saw this thread! My dp code is still not behaving properly but the warnings have gone - anything else I should look out for? I am working with cuda 3.0beta on SnowLeopard 10.6.2 and a GTX 285 Mac Edition.

Because the greatest proportion of CUDA capable GPUs in the wild can’t do double precision. Right now nvcc/nvopencc has to be able to compile for four different architectures. Soon it will be five. The architecture selection in nvcc isn’t any different to gcc requiring command line options to generate 64 bit code or SSE3 intructions, or any other architecture specific features.

I also had to add some double precision versions of functions in cutil.h. In other words, the utility functions in the SDK lack dp.

MacFan, I have had lots of trouble with getting dp code to work correctly. In some cases, my kernels launch but don’t do the complete work. In other cases, I get ULFs. I have had many eyes look over my codes and no one can find anything wrong with it. I submitted a bug report a few weeks ago but haven’t heard back yet.

I spent the weekend completely confused and then realized that (a) I did not have the sm_13 make option properly included (b) had not enforced the double type uniformly throughout the code. For (b) I have now moved the type definition to a single typedef statement that I can switch from float to double and that permeates the entire code. The code now compiles and the arithmetical gibberish has gone.

What I am doing is some algorithms for inverse CDFs for probability applications, and I am looking at the speed improvements to be had by eliminating IF statements where one branch is slower than the others and produces a stall. As of late yesterday I have got speed up of 2-3 in DP over a well known method - but I need to check the answers are indeed correct and do have the desired precision, ~10^-15.