cuFFT batching over two dimensions of a 3D matrix.

Dear all,
This is my first post on the forums, so please do let me know if I have not adhered to any etiquette, etc.

So, I have an MxNxD array in which the elements are ordered in a row major fashion. So, element (m,n,d) would be indexed as (M*n + m)*D + d. What I need to do is perform D FFT’s over MxN matrices.

I have consulted the documentation, as per http://docs.nvidia.com/cuda/cufft/index.html#advanced-data-layout but am still unclear of the mapping between the embed, dist and stride parameters, my data setup and the task at hand.

In addition, I am somewhat concerned about the efficiency of my data layout and coalesced reads for the FFT batching. So, for D=3, my problem is equivelant to performing FFT’s over each channel in an RGB image.

Any insight would be greatly appreciated.

Best regards