So I’m trying to write a program, part of which involves calculating 16K 128-point FFTs on a bunch of data. Here’s how I’m creating my plan:
// Setup FFT plan
cufftResult status = cufftPlan1d(&output_fft, num_channels, CUFFT_R2C, PTS_PER_CHAN);
if (status != CUFFT_SUCCESS)
printf("Error creating forward FFT plan!\n");
num_channels = 128
PTS_PER_CHAN = 16K
This executes fine, without any errors. and now the function where I’m using it:
// ----------------------------------------------------
// Processes an input buffer of data
void process_input(char* inp_buffer, Complex* out_buffer) {
// Copy input buffer to device
cudaMemcpy(dev_inp_buffer, inp_buffer, sizeof(char)*io_buff_size, cudaMemcpyHostToDevice);
// Run filter on input buffer
run_filter<<<1, num_legs>>>(dev_inp_buffer, dev_filt_buffer, dev_sig_buffer, dev_out_sf, taps_per_leg, num_legs);
// Calculate FFT on the output
cufftResult status = cufftExecR2C(output_fft, (cufftReal*)dev_out_sf, (cufftComplex*)dev_out_cf);
// Copy output back to host
cudaMemcpy(out_buffer, dev_out_cf, sizeof(Complex)*io_buff_size, cudaMemcpyDeviceToHost);
}
inp_buffer and out_buffer are both host arrays created using cudaMallocHost, every other buffer prefixed with dev was created with cudaMalloc. But this execution of the FFT fails with CUFFT_EXEC_FAILED, and I’m at a loss to explain why, I’ve got other stuff using FFTs that seems to run fine. Thoughts?
Edit:
I should note I’m on RHEL 4, no X Server is running and I’m running on V2.0 of the toolkit