double precision on the GTX 280

I am using CUDA with gcc 4.3 on FC9. I wrote my first CUDA program a couple of days ago. It’s the simplest thing that one can think of: it allocates a floating point array on the GTX 280, initializes the array to 1.1, multiplies it by 2.0, and brings the result back to the host where the result is compared with the CPU version. Now, all is well when I use single precision (type specifier “float”) but all hell breaks loose when I use “double”. It compiles and runs but all I get out of the card is zeros. I modified my kernel to account for the change in type and I believe I changed everything consistently.

Any guesses at what the problem may be? Do I need to change something when using double instead of float?. Is the size of double different on the GTX280 as compared to the CPU? I don’t have access to my kernel code right now, but I will post it later.

Are you using the right NVCC flag “-arch sm_13” to enable double precision?

Thanks for the responses. Things work now.

I was allocating the entire device memory for the “float” data type. When I changed the data type to double, I was exceeding the device memory limit (1 Gig). Serves me right for not bothering to check the result of cudaMalloc.

I did not know about the sm_13 option. Since I was basing my efforts off the “scalarprod” example the nVidia provides, I was using their Makefile. I did not know how to pass sm_13 to this makefile. It took substantial effort to dig deep into the Makefile and and echo the compilation flags to the screen and then pass them to nvcc