FFT of two real samples using one C2C of cuFFTDx

I have tried to do FFT of two real samples using one C2C of cuFFTDx. I found that it is not straight forward. Is there anybody who made a program for doing fft of two real samples using C2C of cuFFTDx? If any, would you share your implementation or idea? Many thanks in advance.

Related to the above question, one idea is to use information saved in shared memory after FFT function as shown in the following,

FFT().execute(thread_data, shared_mem)

However, there is no description on saved information in the shared memory in the cuFFTDx document. Here goes my question, “What kind of information is saved in the shared memory after FFT function call?”

Shared memory is used by cuFFTDx as scratchpad to exchange data between threads. Before and after FFT function call it is free to use as you see fit.

We will improve R2C/C2R in one of the future releases to be more efficient (no ETA).

Best regards,
Lukasz Ligowski