Kernel works in single precision but not in double

oneelkruns · July 27, 2009, 5:26pm

I have a kernel that works perfectly on multiple cards in single precision (GTX-280 and Tesla c1060). If I switch to double precision, literally by replacing ‘float’ with ‘double’, the kernel runs but gives all zero results. This happens on multiple cards and under Linux or Windows (both 64-bit). I’m using CUDA 2.1 and compiling on the GTX-280. I added ‘-arch sm_13’ to the nvcc command line but it appears to make no difference to the final output.

Have other people seen similar to this? Is there something extra that needs to be done to get double precision to work? The kernel is simple and I can post the code if it is deemed helpful.

Ron

MisterAnderson42 · July 27, 2009, 5:29pm

compile with the command line options -arch=sm_13

oneelkruns · July 27, 2009, 6:55pm

Already tried that before posting (‘-arch sm_13’ and ‘-arch=sm_13’) and it makes no difference to the output. Since the only change between the two versions is replacing ‘float’ with ‘double’ and then compiling (with -arch=sm_13) I’m thinking that there is something in the compiler like another switch to set or whatnot.

Ron

tmurray · July 27, 2009, 8:54pm

are you calling cudaGetLastError() immediately after calling the kernel (before synchronizing or doing a memcpy) and checking to make sure that it returns cudaSuccess?

avidday · July 27, 2009, 9:03pm

That isn’t the only change though - changing the floating point representation to double increases the number of total registers the kernel will use and the amount of shared memory it will require for a given buffer size. It is entirely possible that your kernel is failing to launch because the double precision version run with your single precision execution parameters is exceeding a resource limit. You might need to reduce either the shared memory or threads per block.

oneelkruns · July 27, 2009, 9:37pm

I added -Xptxas -v to the nvcc command line and rebuilt the kernel:

ptxas info : Compiling entry function ‘_Z8k_filterPdPhi’

ptxas info : Used 8 registers, 36+32 bytes smem, 41000 bytes cmem[0], 4 bytes cmem[1]

the single precision version uses 6 registers, the same amount of smem and 21000 bytes of cmem[0]. My single precision version uses 256 threads per block. If I change the double version to use only 64 I still get the same output. I am checking for errors after all CUDA calls and none are reported.

This seems rather strange to me. I don’t see a resource limit being exceeded (and, if one is, shouldn’t there be a warning?)

Ron

tmurray · July 28, 2009, 12:04am

Seriously you should be doing

kernelCall<<<whatever>>>(whatever here too);

cudaError_t result = cudaGetLastError();

if (result != cudaSuccess)

	printf("oh no everything is ruined forever--the kernel didn't even launch\n");

result = cudaThreadSynchronize();

if (result != cudaSuccess)

	printf("oh no now the kernel itself broke in the middle of execution\n");

oneelkruns · July 28, 2009, 8:54pm

Seriously you should be doing

kernelCall<<<whatever>>>(whatever here too);

cudaError_t result = cudaGetLastError();

if (result != cudaSuccess)

	printf("oh no everything is ruined forever--the kernel didn't even launch\n");

result = cudaThreadSynchronize();

if (result != cudaSuccess)

	printf("oh no now the kernel itself broke in the middle of execution\n");

I was checking for errors, there weren’t any. However, I did find what was causing the problem. It was the declaration of constant memory. I had:

define MAX_LIMITS 5

define MAX_PARAMS 1000

define MAX_PARAM_SPACE (MAX_PARAMS*MAX_LIMITS)

device constant double d_params[MAX_PARAM_SPACE];

device constant unsigned char d_index[MAX_PARAMS];

and when I changed it to:

device constant float d_params[MAX_PARAM_SPACE];

the kernel now works properly with the rest of the data being double. With ‘float’ I’m using 21000 bytes of constant memory. With ‘double’ it is using 41000 which is less than 64k so I’m not sure why there would be a difference.

Topic		Replies	Views
Problem with double precision matrix float works, double doesn't CUDA Programming and Performance	7	3915	April 22, 2009
Issues with double precision support on GT200 CUDA Programming and Performance	7	2717	July 7, 2008
Problem with running code with double precision values Double precision gives wrong result CUDA Programming and Performance	2	1187	August 28, 2009
Double Precision code on Single Precision GPU CUDA Programming and Performance	2	1436	September 9, 2008
1080 does not support doubles? CUDA Programming and Performance	5	552	October 30, 2018
Why does my kernel launch? CUDA Programming and Performance	5	5985	February 13, 2009
strange problem with double precision and math functions CUDA Programming and Performance	6	7417	May 11, 2009
Compiling with NVCC for Double Precision CUDA Programming and Performance	3	9522	February 18, 2010
Double Precision Help... Double precision CUDA Programming and Performance	6	5035	September 1, 2011
Switching kernel from single to double precision execution fails. CUDA Programming and Performance	2	515	January 16, 2014

Kernel works in single precision but not in double

Related topics