Batch transforms in cuFFT-Regent

Hi,

Hi, I am trying to implement a FFT transform in Regent , a language for implicit task-based parallelism, by relying on cuFFT.

I’ve had success implementing 1D, 2D, 3D transforms with both R2C and C2C, and am currently trying to implement batched transforms. However, I had a few questions on the implementation:

Our idea is that the user will pass in, say, a 256x256x7 ‘region’, with this meaning that they want 7 batches of a 256x256 2D-transform.

My understanding is that I want to use cufftPlanMany with the advanced data layout. This makes sense to me at a high level, but I’m a little unsure how to interpret some of these parameters.

My input region uses complex64s, which have real and imaginary parts that are doubles. Thus, each element has 16 bytes (2 x 8bytes), which is our stride - offset_1 is 16 below.

My preliminary construction of the call to cufftPLanMany looks as follows:
var ok = cufft_c.cufftPlanMany(&p.cufft_p, dim, &n[0], &int, offset_1, offset_3, &int, offset_1, offset_3, cufft_c.CUFFT_Z2Z, 7)

For idist / odist, I believe this should be 16256256, which is offset_3.

Now is where I’m confused. I still need to fill in the following parameters:

  1. Rank: Is this the rank of the input matrix, which is 3, or the rank of the transform which is 2?
  2. n: similarly, should I be passing an array with elements 256, 256, 7 - or just 256, 256?
  3. iembed/oembed - I think this is what I’m most confused about. How is this different from the ‘n’ array? What should I be passing in here?
  4. Batch: I assume this is 7.

What should the correct call look like?

In addition, how does the ‘execute’ portion work for batched transforms - do I just pass my plan created above to the Exec functions, identical to how it works in the non-batched mode?

My code is located here: regent-fft-arjun/src/fft.rg at main · arjunkunna/regent-fft-arjun · GitHub. The relevant lines for batched transforms are at lines 390-400.

Thank you so much in advance for the help - it’s much appreciated!
Arjun

Hi,

it looks complicated but in fact it is not. Here is my description of the parameters

  • plan[In] – the plan handle you want to make the plan for.
  • rank[In] – how many dimensions does a single transform have. In your case it is 2.
  • n[In] – Array of size rank, describing the size of each dimension, In your case it will be {256, 256}.
  • inembed[In] – Again array of size rank. Here you will pass the real dimensions of the input array. It should contain the dimension size + padding. In your case it will be {256, 256} again, because your input is not padded.
  • istride[In] – Stride between two consecutive elements in lowest dimension. 1 in your case.
  • idist[In] – Distance between two input batches. It will be 256 * 256 for you.
  • onembed[In] – Again array of size rank. Here you will pass the real dimensions of the output array. It should contain the dimension size + padding. In your case it will be {256, 256} again, because your input is not padded.
  • ostride[In] – Stride between two consecutive elements in lowest dimension. 1 in your case.
  • odist[In] – Distance between two output batches. It will be 256 * 256 for you.
  • type[In] – The transform data type. Here you will pass Z2Z or D2Z.
  • batch[In] – How many batches should be computed. Here you pass 7.
  • *workSize[Inout] – Pointer to the size(s), in bytes, of the work areas. In case you do not want to share workspace between plans, you don’t care about this value.

And here are some optimizations. If your input/output array has same dimensions as n, you can pass nullptr to inembed/onembed and it will use the n’s values. So in your case, you needn’t to pass any of them.

Hope it helps.

David

Thank you so much for the help once again! I believe I got this to work with your tips. Greatly appreciate it!