Double precision on GTX 470

Hi,

I am trying to use GPU’s for scientific computing (molecular dynamics) on a code I am developing myself. I have a GTX 470 for developing on and proving things in principle while I try and get the funding together for a Tesla or time on one. I am having some problems with double precision on my machine. The GTX 470 should be capable of double precision calculation but when I try and compile even the simplest of code (copying to the GPU, writing to and copying back) the compiler says:

ptxas /tmp/tmpxft_000016a0_00000000-5_forcekernels.compute_10.ptx, line 85; warning : Double is not supported. Demoting to float

The makefile is really simple and I have pasted it below and the simple code I am testing with is attached.

I am using cuda 3.0 on Ubuntu 10.04. I don’t know why this isn’t working and would very much like to have a double precision version of this working (even if it is limited by the card I have) before moving to a tesla.

If anyone has any suggestions I would really appreciate it.

Thanks

Dean

################################################################################

Build script for project

################################################################################

Executable name

EXECUTABLE := onefile.prog

CUDA source files (compiled with nvcc)

CUFILES := onefile.cu

CUDA dependency files

CU_DEPS :=

C/C++ source files (compiled with gcc/g++)

CCFILES :=

################################################################################

Rules and targets

ROOTDIR = /home/dean/NVIDIA_GPU_Computing_SDK/C/common/
BINDIR = ./bin
include $(ROOTDIR)common.mk

flag to enable double precision

NVCCFLAGS += -arch sm_13

flag to report on use of registers, shared memory, etc

NVCCFLAGS += -Xptxas -v
onefile.cu (2.46 KB)

The buildrule in common.mk builds for multiple architectures by default, including sm_20. So my guess is that you see the message from the build for sm_10, but when executed on your GTX 470 the double precision is still used.

I don’t know how this interacts with the -gencode option used in the build rule, but I guess it is at best useless. nvcc should be invoked with either -gencode or -arch options, but not both.