I finally get some advice. Thanks, Quoc Vinh.
Here are more ?s if you don’t mind.
Now I know I understand what you mentioned;
“The scope of shared memory is block scope. It mean that only threads within block can access to shared memory of there own block.”
And now I understand why the array multiplication example in the book, “Programming Massively Parallel Processors” by David B. Kirk & Wen-mei W. Hwu, defines a temporary 2D array for shared memory like
_shared_float Mds[TILE_WIDTH][TILE_WIDTH] << exact block size.
because the following was defined in the host code
dim3 dimBlock(TILE_WIDTH,TILE_WIDTH); << defined in the host code
Back to my case; what if I define the following for iA array in my host code:
dim3 dimBlock(3,3,33)
and
int i = threadIdx.x + blockDim.x * blockIdx.x;
int j = threadIdx.y + blockDim.y * blockIdx.y;
int k = tjreadIdx.z;
to represent 1D vector iA in terms of i,j,k such as iA(i+j+k) << a conversion of 3d array into 1d
in this case, how should I define a shared memory ?
I think that since I already defined dimBlock(3,3,33), I thought I should define shared int smiA[3,3,33];
Or, form is not important as long as I define 3x3x33=297 like shared int smiA[9,33] or shared int smiA[297]?
Thanks again well in advance for your reply and valuable comments!