cufftExecD2Z returns CUFFT_INVALID_VALUE when giving an offset to a pointer

Hello everyone,
I have a problem with a piece of code where I am trying to execute some 1D FFTs. Specifically, the following code

constexpr int size = 3;
constexpr int offset =1;

int n [] = {16};
const int howmany = 3;

const size_t in_stride  = n[0]*size;
const size_t out_stride =n[0] ;

cufftHandle p;
if (cufftPlanMany(&p, 1, n, 
                               nullptr, size, in_stride, 
                               nullptr,     1, out_stride,
                               CUFFT_D2Z, howmany)
     != CUFFT_SUCCESS) {
   std::cerr << "CUFFT unable to create CUFFT_D2Z plan" << std::endl;
   exit(1);
}

cufftDoubleReal *d_in;
cufftDoubleComplex *d_out;

cudaMalloc((void**)&d_in, sizeof(cufftDoubleReal)*n[0]*howmany*size);
cudaMalloc((void**)&d_in, sizeof(cufftDoubleComplex)*n[0]*howmany);

// Execute the FFT
const cufftResult err = cufftExecD2Z(p, d_in+offset, d_out);

if (err != CUFFT_SUCCESS) {
   std::cerr << "CUFFT unable to cufftExecD2Z with error " << err << std::endl;
   exit(1);
 }

cudaFree(d_in);
cudaFree(d_out);

fails with CUFFT unable to cufftExecD2Z with error 4 unless I set the offset variable to 0.
Do you know any reason why I should not be able to offset the d_in device pointer in this scenario?
How can I have more information about what is triggering the CUFFT_INVALID_VALUE error?

It’s a requirement that the pointers be aligned properly. From here:

Pointers to idata and odata are both required to be aligned to cufftComplex data type in single-precision transforms and cufftDoubleComplex data type in double-precision transforms.

There are two cufftDoubleReal in one cufftDoubleComplex, so you should be able to use offset that is zero or a positive whole-number multiple of 2.

Aside: your posted code appears to have a typo or bug in it:

cudaMalloc((void**)&d_in, sizeof(cufftDoubleReal)*n[0]*howmany*size);
cudaMalloc((void**)&d_in, sizeof(cufftDoubleComplex)*n[0]*howmany);
                    ^^^^

when I compile it, the compiler warns me about this.

Thanks!

Considering that cufftDoubleComplex is 8 bytes aligned, I thought I just needed to preserve the alignment of the input with each element of the output and not with the entire structure.

Aside: your posted code appears to have a typo or bug in it:

You are right. It should be a d_out. I forgot to change the name while doing the copy and paste.

I’m not sure why you would say that.

In cufft.h, cufftDoubleComplex is a typedef based on cuDoubleComplex. In cuComplex.h, cuDoubleComplex is a typedef for double2. In vector_types.h double2 is a typedef for the struct of the same name, and the definition of that struct is:

struct  __device_builtin__ __builtin_align__(16) double2 { double x,y; };

That struct is intended for 16 byte alignment. That carries all the way through the typedef chain.

I was printing an alignof(cufftDoubleComplex) and the output was 8. Maybe I was doing something wrong. Thanks again for the explaination.

I get a value of 16 when I use alignof(cufftDoubleComplex):

# cat t7.cu
#include <cufft.h>
#include <iostream>

int main(){

  std::cout << alignof(cufftDoubleComplex) << std::endl;
}
# nvcc -o t7 t7.cu
# ./t7
16
#

And similarly with g++:

# cat t7.cpp
#include <cufft.h>
#include <iostream>

int main(){

  std::cout << alignof(cufftDoubleComplex) << std::endl;
}
# g++ -I/usr/local/cuda/include t7.cpp -o t7c
# ./t7c
16
#

CUDA 12.2.1

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.