Hi,

Hi, I am trying to implement a FFT transform in Regent , a language for implicit task-based parallelism, by relying on cuFFT.

I’ve had success implementing 1D, 2D, 3D transforms with both R2C and C2C, and am currently trying to implement batched transforms. However, I had a few questions on the implementation:

Our idea is that the user will pass in, say, a 256x256x7 ‘region’, with this meaning that they want 7 batches of a 256x256 2D-transform.

My understanding is that I want to use cufftPlanMany with the advanced data layout. This makes sense to me at a high level, but I’m a little unsure how to interpret some of these parameters.

My input region uses complex64s, which have real and imaginary parts that are doubles. Thus, each element has 16 bytes (2 x 8bytes), which is our stride - offset_1 is 16 below.

My preliminary construction of the call to cufftPLanMany looks as follows:

var ok = cufft_c.cufftPlanMany(&p.cufft_p, dim, &n[0], &int, offset_1, offset_3, &int, offset_1, offset_3, cufft_c.CUFFT_Z2Z, 7)

For idist / odist, I believe this should be 16*256*256, which is offset_3.

Now is where I’m confused. I still need to fill in the following parameters:

- Rank: Is this the rank of the input matrix, which is 3, or the rank of the transform which is 2?
- n: similarly, should I be passing an array with elements 256, 256, 7 - or just 256, 256?
- iembed/oembed - I think this is what I’m most confused about. How is this different from the ‘n’ array? What should I be passing in here?
- Batch: I assume this is 7.

What should the correct call look like?

In addition, how does the ‘execute’ portion work for batched transforms - do I just pass my plan created above to the Exec functions, identical to how it works in the non-batched mode?

My code is located here: regent-fft-arjun/src/fft.rg at main · arjunkunna/regent-fft-arjun · GitHub. The relevant lines for batched transforms are at lines 390-400.

Thank you *so much* in advance for the help - it’s much appreciated!

Arjun