Hello,
I’m hoping someone can point me in the right direction on what is happening. I have three code samples, one using fftw3, the other two using cufft. My fftw example uses the real2complex functions to perform the fft. My cufft equivalent does not work, but if I manually fill a complex array the complex2complex works. Here are some code samples:
float *ptr is the array holding a 2d image which is my test case of size w, h. I apply a fft along the width.
[codebox]
cufftHandle plan;
cufftPlanMany (&plan, 1, &w, NULL, 1, 0, NULL, 1, 0, CUFFT_C2C, h);
cufftComplex *devin;
cudaMalloc ((void**)&devin, sizeof (cufftComplex)*w*h);
cufftComplex *devout;
cudaMalloc ((void**)&devout, sizeof (cufftComplex)*w*h);
cufftComplex hostd = new cufftComplex[wh];
for (int i = 0; i < h; i++)
{
for (int j = 0; j < w; j++)
{
hostd[i*w + j].x = ptr[i*w + j];
hostd[i*w + j].y = 0.f;
}
}
cudaMemcpy (devin, hostd, sizeof (cufftComplex)wh, cudaMemcpyHostToDevice);
printf (“-= Performing CUDA FFT forward =-\n”);
cufftExecC2C (plan, devin, devout, CUFFT_FORWARD);
cudaMemcpy (hostd, devout, sizeof (cufftComplex)wh, cudaMemcpyDeviceToHost);
for (int i = 0; i < h; i++)
{
for (int j = 0; j < w; j++)
{
hostd[i*w + j].x *= filter[j];
hostd[i*w + j].y *= filter[j];
}
}
delete filter;
cudaMemcpy (devin, hostd, sizeof (cufftComplex)wh, cudaMemcpyHostToDevice);
printf (“-= Performing CUDA FFT inverse =-\n”);
cufftExecC2C (plan, devin, devout, CUFFT_INVERSE);
cudaMemcpy (hostd, devout, sizeof (cufftComplex)wh, cudaMemcpyDeviceToHost);
for (int i = 0; i < h; i++)
{
for (int j = 0; j < w; j++)
{
ptr[i*w + j] = hostd[i*w + j].x/w;
}
}
delete hostd;
cufftDestroy (plan);
cudaFree (devout);
cudaFree (devin);
[/codebox]
Same input values, except I create two plans, one for R2C, then C2R. This produces incorrect results.
[codebox]
cufftHandle plan1, plan2;
cufftPlanMany (&plan1, 1, &w, NULL, 1, 0, NULL, 1, 0, CUFFT_R2C, h);
cufftPlanMany (&plan2, 1, &w, NULL, 1, 0, NULL, 1, 0, CUFFT_C2R, h);
float *devin;
cudaMalloc ((void**)&devin, sizeof (float)*w*h);
cufftComplex *devout;
cudaMalloc ((void**)&devout, sizeof (cufftComplex)*w*h);
cudaMemcpy (devin, ptr, sizeof (float)wh, cudaMemcpyHostToDevice);
printf (“-= Performing CUDA FFT forward =-\n”);
cufftExecR2C (plan1, devin, devout);
cufftComplex hostd = new cufftComplex[wh];
cudaMemcpy (hostd, devout, sizeof (cufftComplex)*w*h, cudaMemcpyDeviceToHost);
for (int i = 0; i < h; i++)
{
for (int j = 0; j < w; j++)
{
hostd[i*w + j].x *= filter[j];
hostd[i*w + j].y *= filter[j];
}
}
delete filter;
cudaMemcpy (devout, hostd, sizeof (cufftComplex)wh, cudaMemcpyHostToDevice);
printf (“-= Performing CUDA FFT inverse =-\n”);
cufftExecC2R (plan2, devout, devin);
cudaMemcpy (ptr, devin, sizeof (float)wh, cudaMemcpyDeviceToHost);
delete hostd;
cufftDestroy (plan2);
cufftDestroy (plan1);
cudaFree (devout);
cudaFree (devin);
[/codebox]
I can provide the fftw equivalent if its relevant. The first version, C2C, works in producing the same look, but normalizes the values (which I think is caused by the divide by width when copying back to ptr). The fftw version does not perform this normalization. The second cufft version, R2C and C2R, does not work and it returns the image, unchanged as far as i can tell. The filter being applied should greatly change the way the image looks. Thanks for any assistance!
-brad
-edit Corrected memcpy so it shows copy from host to device after applying the filter correctly