Hi,

I have encountered in troubles when using cufftPlanMany function to calculate 2D fft. I know that exists a function to do that in a simpler way but I want to use cufftPlanMany to do batch execution.

I am testing the function with a signal of 4x4 points (four rows and four columns) and with batch values 1,2,4,8. When I use a batch value different to 1, I copy the first signal into the others. So, if batch equals 4 then s_0=s_1=s_2=s_3, where s_i means the signal in the i position.

I store the values in row major order and in plane major order. I.e, elements of the same row are consecutive in memory. Two rows of the same signal are consecutive in memory. For example, If I do signal[z][y] I mean the element in the signal number z in the batch (signal in the plane z, the outermost dimension), in the y row and in the x column (x axis is the innermost dimension).

I am using Real to Complex transform so, if original signal is original_signal[NUM_BATCH], forward FFT signal is forward_signal[NUM_BATCH] and backward FFT is [NUM_BATCH].

What happens is that forward FFT is different for each input signal (remember if more there are more than one signal all are equals) and when I do the backward FFT, all of them are equal to original signal.

Below, some fragments of code are given to show how I call the functions (see below the declaration of variables)

```
check_return_value( cufftPlanMany( &fftHandle,
RANK,
points_per_dim,
inembed,
istride,
idist,
onembed,
ostride,
odist,
type,
batch ) );
check_return_value( cufftExecD2Z( fftHandle,
(cufftDoubleReal*) d_idata,
(cufftDoubleComplex*) d_odata ));
```

d_idata is declared as

```
cufftDoubleReal* d_idata;
```

and it is allocated with

```
cudaMalloc( (void**) &d_idata, NUM_FFT * SIZE * SIZE * sizeof( cufftDoubleReal ) );
```

d_odata is declared as

```
cufftDoubleComplex* d_odata;
```

and allocated with

```
cudaMalloc( (void**) &d_odata, NUM_FFT * (SIZE) * (SIZE/2 + 1) * sizeof( cufftDoubleComplex ) );
```

Variable NUM_FFT equals to batch (8) and SIZE is the number of points in each dimension (4).

I have a function to print the values pass as parameters just before the functions. This is what I get with that function.

NRANK : 2

n[0] : 4

n[1] : 4

inembed[0] : 4

inembed[1] : 4

istride : 1

idist : 16

onembed[0] : 4

onembed[1] : 4

ostride : 1

odist : 16

batch : 8

Invocation of functions return no errors.

I am using GTX258 card, in Ubuntu 11.10 with the latest driver (304.54). SDK version is 5.0.

I don’t what the problem is. Can anyone help me?

Thanks in advance,

Javier.