Change cufftXtSubFormat on cudaLibXtDesc after processing cufftXtExecDescriptorC2C

Good day!

I try to calculate the direct Fourier transform several times on two video cards and get a runtime error.
I am doing:

Initialization block:

cufftResult result_;
// create plan
int nGPUs_ = 2;
size_t worksize = (size_t)malloc(sizeof(size_t) * nGPUs_);
result_ = cufftMakePlan1d(plan_input_, n_size_, CUFFT_C2C, 1, worksize);
if (result_ != CUFFT_SUCCESS) { printf(“MakePlan failed\n”); exit(EXIT_FAILURE); }
// cufftXtMalloc() - Malloc data on multiple GPUs
result_ = cufftXtMalloc(plan_input_, (cudaLibXtDesc **)&device_channel_1_signal_, CUFFT_XT_FORMAT_INPLACE);
if (result_ != CUFFT_SUCCESS) { printf("*XtMalloc failed\n"); exit(EXIT_FAILURE); }
// cufftXtMemcpy() - Copy data from host to multiple GPUs
result_ = cufftXtMemcpy(plan_input_, device_channel_1_signal_, host_channel_1_signal, CUFFT_COPY_HOST_TO_DEVICE);
if (result_ != CUFFT_SUCCESS) { printf("*XtMemcpy failed\n"); exit(EXIT_FAILURE); }
result_ = cufftXtMalloc(plan_input_, (cudaLibXtDesc **)&device_channel_1_tuned_signal_, CUFFT_XT_FORMAT_INPLACE);

Some repeated function:

// TuneSignal() - Returns descriptor on tuned signal “device_channel_1_tuned_signal_”
TuneSignal(device_channel_1_signal_, device_channel_1_tuned_signal_, nGPUs_, rad_fre, rad_shift);
// cufftXtExecDescriptorC2C() - Execute FFT on data on multiple GPUs
result_ = cufftXtExecDescriptorC2C(plan_input_, device_channel_1_temp_result_fft_, device_channel_1_tuned_signal_, CUFFT_FORWARD);
if (result_ != CUFFT_SUCCESS) { printf("*XtExecC2C failed\n"); exit(EXIT_FAILURE); }

On the first call, TuneSignal() returns the correct result. After performing cufftXtExecDescriptorC2C(), my function TuneSignal() cannot return the correct result, as the handle device_channel_1_tuned_signal_ the format is changed with CUFFT_XT_FORMAT_INPLACE on CUFFT_XT_FORMAT_INPLACE_SHUFFLED and second the calculation of the Fourier transform (cufftXtExecDescriptorC2C) also returns error - “CUFFT_INVALID_TYPE”, as the type has changed.
Please, tell how to organize my code so that it is possible to calculate the multiple Fourier transform by creating a plan and allocating memory once.