cufft instability

I got my image processing program to work just fine but when I try to change things I run into instability issues with cufft.

What I do is create cufft plans 1 & 2 then do a cufft exec.

Here are some symptoms:

  • Before the method that does a few filtering tasks I have a log statement. It prints image number being processed to console and log file. When I comment out the print/log statement the cufftPlan1 that follows fails.

  • I turn the plans into class (member) variables instead of local variables within a method. OK fine. No problem. Then I try to run the cufftPlan1d in the constructor. When I do that the cufftExec in a method fails.

  • When I create cufftPlan1d with a 1024 transform size then work on an image roughly 1000x800 in size, no problem. When I create cufftPlan1d with a 2048 transform size then work on an image roughly 2000x1600 in size, the cufftExec fails (out of memory error).

Initially I thought I was dealing with a memory leak. However I checked and double checked and it appears that is not the cause. Any ideas what could be causing this instability?

specs:

Windows 7 with 40GB RAM
Visual Studio 2010
Cuda 4.0
Tesla GPU with 3GB RAM

If you use 1D plan with 2048 transform size, cufftExec will take really 2048 sequential elements of data (not “roughly 2000”), and of course it will fail if there is not enough data or the data is aligned when allocated with cudaMallocPitch.

So why does a 1024 transform work with 1000x800 images? Luck? Would that possibly be the reason why the other instability I mentioned happens? So you’re saying that the transform size should be an exact divisor of the image size? From what I understand the transform size of an fft was supposed to be a multiple of 2, hence 1024. So I suppose I need to allocate more memory to store the image than is needed to store the image for the fft functions. Correct?

Yes, if you allocated the image 1000x800 with cudaMallocPitch, then pitch could be 1024, so there is enough data for 1D-cufft of size 1024. Transform size should not be an exact divisor, but to support enough data elements it should be no more than image size. And if you use 2D-cufft, keep in mind that it doesn’t know about pitched data and takes it as a raw, therefore if the data pitch is not equal the cufft transform width, you may get error about incorrectly aligned data.
Also transform size is not necessary power of 2, just power of 2 gives the best speed, so you may either set transform size to image size or as you said to transform a padded image.