Help on Error: Unaligned Memory access not supported

Hello,
In my kernel, I have the a code segment as follows:

extern shared char array;
//global void mvBD(ValueType** A, ValueType* x, ValueType* y,IndexType matBlkSize, IndexType matNumBlk) //#2
global void mvBD(ValueType** A, ValueType* x, ValueType* y,IndexType matNumBlk) //#3
{
const IndexType matBlkSize=3;
unsigned int matBlkIdx=blockDim.xblockIdx.x+threadIdx.x;
unsigned int offset=matBlkIdx
matBlkSize, offset2=blockDim.xblockIdx.xmatBlkSize;
//the following two lines are for allocating shared memory dynamically.
ValueType** As=(ValueType**)array;
ValueType* ys=(ValueType*)(array+matBlkSizematBlkSizesizeof(ValueType
)+threadIdx.xmatBlkSize*sizeof(ValueType));

ValueType xs[3];

if(tx<matBlkSize*matBlkSize){
As[tx]=A[tx];
}

__syncthreads();
if(matBlkIdx<matNumBlk){
for(i=0;i<matBlkSize;i++){
ys[i]=0.0f;
for(j=0;j<matBlkSize;j++)
ys[i]+=As[i*matBlkSize+j][matBlkIdx]*xs[j]; //#1
}
}
__syncthreads();

return;
}
the line #1 gives me a compile error: Unaligned Memory Access not supported. But If I treat matBlkSize as a function parameter and pass its value from outside, the kernel runs correctly. But I do not understand why it is different from this two treatment. Anyone can explain it for me? Thank you.

You’re on a 64-bit platform and IndexType is a typedef to a 32-bit type, right?

array is of type char, so there is no guarantee it will be aligned for larger types, including pointers. I believe data in shared memory should always be aligned, although I can’t find a clear confirmation of this in the Programming Guide.

Also, kernel parameters are passed through shared memory.

So my guess is that adding one extra 32-bit parameter reestablishes alignment of shared memory to 64-bit, so the code works by accident.

Solutions would be to either align the array pointer yourself with pointer arithmetic, or better to declare it as an array of some 64-bit/pointer type, then make sure you only cast it to a smaller or equal type.