cuFFT inverse real-to-complex?

Hi, got a polyphase filterbank with output in lets say 32 parallel channels that produce real-valued samples. I’d need to Inverse FFT across these 32 channels (32-point IFFT) to get time-domain streams.

Under CUDA 7.0 and the included library the main options seem to be

cufftExecC2C(fftplan, (cufftComplex*)d_pfbout_c32, (cufftComplex*)d_pfbout_c32_td, CUFFT_INVERSE);

cufftExecR2C(fftplan, (cufftReal*)d_pfbout_f32, (cufftComplex*)d_pfbout_c32_td);

but the latter R2C is for whatever reason “implicitly a forward transform”. How to get an inverse? Any internal hack to swap out the underlying twiddle factors? I’m using cufftPlanMany(). Already tried letting the PFB write out “complex” data i.e. pairs of (real,0.0f) for C2C IFFT, but memory bandwidth wise real data (half the writes) followed by a R2C IFFT would probably be overall faster.


Oddly, having PFB writing into pre-zeroed output array with stride 2, increases the throughput of the PFB in terms of samples/second by some 20%. So perhaps this sparser output writing followed by C2C IFFT is not that bad after all. Still curious though how to get R2C IFFT…