cuFFT 3.1 and data alignment with CUDA FFT library problem

We have run into an issue moving from the cuFFT 3.0 library to cuFFT 3.1.
It now seems that the cufftExecC2C call requires that the input data pointer be 256 byte (32 cufftComplex elements) aligned.
This didn’t appear to be the case for 3.0, and I can find nothing in the documentation to suggest this is a requirement.

My application, that previously ran fine, now gives the following error:
cudaSafeCall() Runtime API error in file <D:/Bld/rel/gpgpu/toolkit/r3.1/cufft/src/accel/interface/sp1D.cu>, line 118 : invalid argument.

I my application, we are adding an offset to a pointer returned from a cudaMalloc() call and using this as the input data address in cufftExecC2C. The offset is always a multiple of 8 bytes.

So the big question is, is this a bug in cuFFT 3.1 or a new ‘feature’? (Or maybe finger trouble my end)
Any help would be appreciated. Thanks.

Tetters, I’ve run into similar trouble going from 3.0 to 3.1.
Here is the error the code gives:
cutilCheckMsg() CUTIL CUDA error: dpRadix0016B_kernel<FFT_INVERSE>(main) execution failed
in file </home/buildmeister/build/rel/gpgpu/toolkit/r3.1/cufft/src/accel/interface/dpRadix0016B.cu>, line 45 : invalid argument.

As in your case, my input address to cufftExecC2C is generated by offsetting from a pointer…

Tetters, I’ve run into similar trouble going from 3.0 to 3.1.
Here is the error the code gives:
cutilCheckMsg() CUTIL CUDA error: dpRadix0016B_kernel<FFT_INVERSE>(main) execution failed
in file </home/buildmeister/build/rel/gpgpu/toolkit/r3.1/cufft/src/accel/interface/dpRadix0016B.cu>, line 45 : invalid argument.

As in your case, my input address to cufftExecC2C is generated by offsetting from a pointer…

I have a similar problem as llin, except mine mine I am coding de novo with 3.1. I haven’t tried is with lesser versions

I get:

cutilCheckMsg() CUTIL CUDA error: spRadix0016B_kernel<FFT_FORWARD>(main) execution failed

in file </home/buildmeister/build/rel/gpgpu/toolkit/r3.1/cufft/src/accel/interface/spRadix0016B.cu>, line 35 : invalid argument.

Pointer is 32 byte aligned, and the code works when not using mapped memory (which is working fine). The memory being transformed is not mapped. 3d, 512x512x128, C2C.

I have a similar problem as llin, except mine mine I am coding de novo with 3.1. I haven’t tried is with lesser versions

I get:

cutilCheckMsg() CUTIL CUDA error: spRadix0016B_kernel<FFT_FORWARD>(main) execution failed

in file </home/buildmeister/build/rel/gpgpu/toolkit/r3.1/cufft/src/accel/interface/spRadix0016B.cu>, line 35 : invalid argument.

Pointer is 32 byte aligned, and the code works when not using mapped memory (which is working fine). The memory being transformed is not mapped. 3d, 512x512x128, C2C.

Thanks for the confirmation! I glad its not just me.

Thanks for the confirmation! I glad its not just me.

Thanks for reporting this issue. Several others have noted it as well. The issue stems from the use of texture fetch in some of the kernels inside CUFFT; it was less likely to be encountered in versions of CUFFT 3.0 and earlier because fewer of the CUFFT kernels used texture fetch.

We’ve already implemented two partial fixes for this issue for the next version of CUFFT:

  • When this condition occurs, the “cutilCheckMsg() CUTIL CUDA error” is no longer reported, and your application will not crash.
  • We also improved the error reporting for this so that there is a dedicated error code to indicate that this has occurred.

The present workaround from the application’s perspective is to check whether the data being passed in to CUFFT is aligned appropriately, and if not, you have to memcpy the data to a region that is appropriately aligned. This is obviously less than ideal, so we’re working on a more complete fix for this issue for a future version of CUFFT whereby the kernels inside CUFFT would be able to use the appropriate offsets for their texture fetch to make all of this be transparent to the application.

Thanks,
Cliff

Thanks for reporting this issue. Several others have noted it as well. The issue stems from the use of texture fetch in some of the kernels inside CUFFT; it was less likely to be encountered in versions of CUFFT 3.0 and earlier because fewer of the CUFFT kernels used texture fetch.

We’ve already implemented two partial fixes for this issue for the next version of CUFFT:

  • When this condition occurs, the “cutilCheckMsg() CUTIL CUDA error” is no longer reported, and your application will not crash.
  • We also improved the error reporting for this so that there is a dedicated error code to indicate that this has occurred.

The present workaround from the application’s perspective is to check whether the data being passed in to CUFFT is aligned appropriately, and if not, you have to memcpy the data to a region that is appropriately aligned. This is obviously less than ideal, so we’re working on a more complete fix for this issue for a future version of CUFFT whereby the kernels inside CUFFT would be able to use the appropriate offsets for their texture fetch to make all of this be transparent to the application.

Thanks,
Cliff

Did CUFFT change from CUDA 2.3 to CUDA 3.0? I have some code that uses 3D FFT that worked fine in CUDA 2.3 but seems to give strange results with CUDA 3.0.

Did CUFFT change from CUDA 2.3 to CUDA 3.0? I have some code that uses 3D FFT that worked fine in CUDA 2.3 but seems to give strange results with CUDA 3.0.

Certainly… the CUDA software team is continually working to improve all of the libraries in the CUDA Toolkit, including CUFFT.

A change in behavior between CUFFT 2.3 and 3.0 is unlikely to be unrelated to the issues discussed in this thread. Perhaps you could start a new thread for this and describe the issue you’re seeing in more detail. Also, it would be useful to know what results you see with CUFFT 3.1.

Thanks,

Cliff

Certainly… the CUDA software team is continually working to improve all of the libraries in the CUDA Toolkit, including CUFFT.

A change in behavior between CUFFT 2.3 and 3.0 is unlikely to be unrelated to the issues discussed in this thread. Perhaps you could start a new thread for this and describe the issue you’re seeing in more detail. Also, it would be useful to know what results you see with CUFFT 3.1.

Thanks,

Cliff

Thanks Cliff, for giving us the official word on this. A couple of quick follow-up questions:

  1. When is the next version of CUDA/CUFFT due out?
  2. There are projects such as the one reported here: http://portal.acm.org/citation.cfm?id=1810127, which report FFT implementations on GPUs faster than CUFFT. Will future versions of CUFFT show further improvements in speed?
    Many Thanks…

Thanks Cliff, for giving us the official word on this. A couple of quick follow-up questions:

  1. When is the next version of CUDA/CUFFT due out?
  2. There are projects such as the one reported here: http://portal.acm.org/citation.cfm?id=1810127, which report FFT implementations on GPUs faster than CUFFT. Will future versions of CUFFT show further improvements in speed?
    Many Thanks…

I’m not aware of the schedule for this having been announced as of yet.

The paper you cited seems to compare against CUFFT 2.2 and/or 2.3 and was apparently written before Fermi-based GPUs were available. There have definitely been additional performance improvements in the 3.x releases of CUFFT above and beyond what was available at the time of the paper (in fact, one such optimization is what led to the stricter alignment requirement that started this thread, unfortunately).

–Cliff

I’m not aware of the schedule for this having been announced as of yet.

The paper you cited seems to compare against CUFFT 2.2 and/or 2.3 and was apparently written before Fermi-based GPUs were available. There have definitely been additional performance improvements in the 3.x releases of CUFFT above and beyond what was available at the time of the paper (in fact, one such optimization is what led to the stricter alignment requirement that started this thread, unfortunately).

–Cliff