I am using cuda 9.1 and Jetson TX2 board which gpu is SM=6.1. I use unified memory to manage data and do fft in gpu and then operate these data again in cpu. Because I use unified memory I don’t need memcpy these data from cpu to gpu or from gpu to cpu. I just operate data in unified memory to do fft in gpu and then do operation with these data in cpu directly.But I found the calculation result is not correct after I used the unified memory.Before I use the unified memory the calculation result is correct with same processes.
I wondered if the concurrent in unified memory cause the problem because I am use multiple thread in cpu side, Or the data messing in unified memory.Please help me with it.
Here is my basic code:
checkCudaErrors(cudaMallocManaged((void **)&unified_mem, mem_size,cudaMemAttachHost));
copy data to unified_mem in host side.
checkCudaErrors(cufftExecC2C(ifft_plan, (cufftComplex *)unified_mem, (cufftComplex *)unified_mem, CUFFT_INVERSE));
Then I use the data under the pointer of unified_mem to do calculation in host side.
I check the result and found the final calculation result is not correct after the whole process.
If I don’t use unified memory, the calculation result with same processes is correct.