Now I meet some problems in such case: suppose I have two stack of complex images stored in two 3D matrixes A and B, whose size are row*col*n_A and row*col*n_B respectively.

What I have to do is : ( Let me describe it in Matlab code)

n=1;

for i=1:n_A

for j=1:n_B

C(:,:,n)= A(:,:,i) .* B(:,:,j);

n=n+1;

end

end

I want to implement it in CUDA. First, I should read A and B to device memory. One way is to use two 3D texture to store A and B, but considering the limit size of 3D texture( 2048*2048*2048, which means n_A and n_B should less than 2048) , I want to use other way. Can I allocate a set of 2D cuda arrays or 2D textures to store A and in the meantime store all the addresses of those 2D arrays in a pointer array on device? (the same with B) . And then in the loop, access each array in stack A and stack B via pointer stored in two pointer array.

Is this makes sense?