Hello,
In my kernel, I have the a code segment as follows:
extern shared char array;
//global void mvBD(ValueType** A, ValueType* x, ValueType* y,IndexType matBlkSize, IndexType matNumBlk) //#2
global void mvBD(ValueType** A, ValueType* x, ValueType* y,IndexType matNumBlk) //#3
{
const IndexType matBlkSize=3;
unsigned int matBlkIdx=blockDim.xblockIdx.x+threadIdx.x;
unsigned int offset=matBlkIdxmatBlkSize, offset2=blockDim.xblockIdx.xmatBlkSize;
//the following two lines are for allocating shared memory dynamically.
ValueType** As=(ValueType**)array;
ValueType* ys=(ValueType*)(array+matBlkSizematBlkSizesizeof(ValueType
)+threadIdx.xmatBlkSize*sizeof(ValueType));
ValueType xs[3];
if(tx<matBlkSize*matBlkSize){
As[tx]=A[tx];
}
…
__syncthreads();
if(matBlkIdx<matNumBlk){
for(i=0;i<matBlkSize;i++){
ys[i]=0.0f;
for(j=0;j<matBlkSize;j++)
ys[i]+=As[i*matBlkSize+j][matBlkIdx]*xs[j]; //#1
}
}
__syncthreads();
…
return;
}
the line #1 gives me a compile error: Unaligned Memory Access not supported. But If I treat matBlkSize as a function parameter and pass its value from outside, the kernel runs correctly. But I do not understand why it is different from this two treatment. Anyone can explain it for me? Thank you.