CUDA approach to handle two stack of 2D matrixes

Now I meet some problems in such case: suppose I have two stack of complex images stored in two 3D matrixes A and B, whose size are rowcoln_A and rowcoln_B respectively.
What I have to do is : ( Let me describe it in Matlab code)
n=1;
for i=1:n_A
for j=1:n_B
C(:,:,n)= A(:,:,i) .* B(:,:,j);
n=n+1;
end
end

I want to implement it in CUDA. First, I should read A and B to device memory. One way is to use two 3D texture to store A and B, but considering the limit size of 3D texture( 204820482048, which means n_A and n_B should less than 2048) , I want to use other way. Can I allocate a set of 2D cuda arrays or 2D textures to store A and in the meantime store all the addresses of those 2D arrays in a pointer array on device? (the same with B) . And then in the loop, access each array in stack A and stack B via pointer stored in two pointer array.

Is this makes sense?