I recently obtained a GTX 260 and wanted to test double precision support. I figured a simple way to do it would be to switch all the float variables to double in the matrixMul example from the SDK.
I did that and it compiles and runs without errors but the numbers are completely wrong. Not only that but commenting out the kernel call still gives me the exact same numbers so it seems either the kernel is not doing anything or it’s writing to the wrong part of memory.
I am confused about how double precision is supported. I’m running the latest version of CUDA (2 beta2, on linux) which I thought would be all that was needed.
Just to make it clear here are a couple of snippets from the code (but it is literally just the matrixMul example with the float variables changed to double):
unsigned int mem_size_C = sizeof(double) * size_C; // allocate device memory for result double* d_C; CUDA_SAFE_CALL(cudaMalloc((void**) &d_C, mem_size_C));
__global__ void matrixMul( double* C, double* A, double* B, int wA, int wB)
__shared__ double As[BLOCK_SIZE][BLOCK_SIZE];