Misaligned address error appears in a double precision program after using CUDA 6.0

Hi, I have a program running okay in both single- or double precision floating data type in CUDA ver 5.5. I recently update CUDA to ver 6.0, and the single precision floating data version program still runs okay, but I got an error “misaligned address” in double precision version. Simply I tested a program given below:

#define tidx threadIdx.x
extern shared char SVec;
global void GPU_Kernel()
{
shared real *sp_smax;
if(tidx==0) sp_smax = (real *)SVec;
__syncthreads();
sp_smax[tidx] = zero; // The problem occurs when adding this line in
__syncthreads();
}

In the program, if ‘real’ is defined as float, ‘zero’ is defined as 0.0f and thus the program is single precision floating data type and can run without problem after compiled in CUDA 6.0.

However, if ‘real’ is defined as double, ‘zero’ is then defined as 0.0, so the program is double precision, and after compiled with CUDA 6.0, the program gives the “misaligned address” error. After test, we found this error occurs only when we add the line code “sp_smax[tidx] = zero” in this program. Does anyone experience a similar issue? Any hints about what is wrong?

By the way, my GPU device is Quadro 4000, and the compute capability is therefore 2.0.

Thanks for any help.

Your example is working on my Quadro 4000, both in SP and DP.
With nvcc release 6.0, V6.0.1 and Driver Version: 331.44.

Can you post your full working (failing) code?

I have just updated CUDA to ver 6.5 RC, the problem is gone, but I cannot solve it any way in CUDA 6.0. This is weird, but thanks hadschi118 for your help.