Hi, I have a program running okay in both single- or double precision floating data type in CUDA ver 5.5. I recently update CUDA to ver 6.0, and the single precision floating data version program still runs okay, but I got an error “misaligned address” in double precision version. Simply I tested a program given below:
#define tidx threadIdx.x
extern shared char SVec;
global void GPU_Kernel()
{
shared real *sp_smax;
if(tidx==0) sp_smax = (real *)SVec;
__syncthreads();
sp_smax[tidx] = zero; // The problem occurs when adding this line in
__syncthreads();
}
In the program, if ‘real’ is defined as float, ‘zero’ is defined as 0.0f and thus the program is single precision floating data type and can run without problem after compiled in CUDA 6.0.
However, if ‘real’ is defined as double, ‘zero’ is then defined as 0.0, so the program is double precision, and after compiled with CUDA 6.0, the program gives the “misaligned address” error. After test, we found this error occurs only when we add the line code “sp_smax[tidx] = zero” in this program. Does anyone experience a similar issue? Any hints about what is wrong?
By the way, my GPU device is Quadro 4000, and the compute capability is therefore 2.0.
Thanks for any help.