The calculation result is not correct after I use unified memory to transfter the data between gpu and cpu.

I am using cuda 9.1 and Jetson TX2 board which gpu is SM=6.1. I use unified memory to manage data and do fft in gpu and then operate these data again in cpu. Because I use unified memory I don’t need memcpy these data from cpu to gpu or from gpu to cpu. I just operate data in unified memory to do fft in gpu and then do operation with these data in cpu directly.But I found the calculation result is not correct after I used the unified memory.Before I use the unified memory the calculation result is correct with same processes.
I wondered if the concurrent in unified memory cause the problem because I am use multiple thread in cpu side, Or the data messing in unified memory.Please help me with it.

Here is my basic code:
cpu thread_n()
{
checkCudaErrors(cudaMallocManaged((void **)&unified_mem, mem_size,cudaMemAttachHost));
checkCudaErrors(cudaDeviceSynchronize());
copy data to unified_mem in host side.
checkCudaErrors(cufftExecC2C(ifft_plan, (cufftComplex *)unified_mem, (cufftComplex *)unified_mem, CUFFT_INVERSE));
checkCudaErrors(cudaDeviceSynchronize());
Then I use the data under the pointer of unified_mem to do calculation in host side.
}

I check the result and found the final calculation result is not correct after the whole process.
If I don’t use unified memory, the calculation result with same processes is correct.

More question:
If I use stream and attach this stream with a fft plan and a ifft plan, should I create the stream and destroy the stream every loop count? Because only the result of fft and ifft during first time loop of in a thread is correct after a thread(fft and ifft are runing in this thread) is created start to run. In the consequency loop the fft and ifft result are all not correct.

I wouldn’t.

Sounds like a bug in your code. It’s difficult to diagnose based on a 1-sentence description. Usually for people who are asking for help with code that is not working, I recommend posting a complete example. If I can’t copy, paste, compile, and run, and see the issue, I usually don’t even bother with it. If you want other people to help you, I suggest you make it as easy as possible for them to do so.