The calculation result is not correct after I use unified memory to transfter the data between gpu and cpu.

stevendmv123 · March 23, 2018, 6:58pm

I am using cuda 9.1 and Jetson TX2 board which gpu is SM=6.1. I use unified memory to manage data and do fft in gpu and then operate these data again in cpu. Because I use unified memory I don’t need memcpy these data from cpu to gpu or from gpu to cpu. I just operate data in unified memory to do fft in gpu and then do operation with these data in cpu directly.But I found the calculation result is not correct after I used the unified memory.Before I use the unified memory the calculation result is correct with same processes.
I wondered if the concurrent in unified memory cause the problem because I am use multiple thread in cpu side, Or the data messing in unified memory.Please help me with it.

Here is my basic code:
cpu thread_n()
{
checkCudaErrors(cudaMallocManaged((void **)&unified_mem, mem_size,cudaMemAttachHost));
checkCudaErrors(cudaDeviceSynchronize());
copy data to unified_mem in host side.
checkCudaErrors(cufftExecC2C(ifft_plan, (cufftComplex *)unified_mem, (cufftComplex *)unified_mem, CUFFT_INVERSE));
checkCudaErrors(cudaDeviceSynchronize());
Then I use the data under the pointer of unified_mem to do calculation in host side.
}

I check the result and found the final calculation result is not correct after the whole process.
If I don’t use unified memory, the calculation result with same processes is correct.

stevendmv123 · March 26, 2018, 7:00pm

More question:
If I use stream and attach this stream with a fft plan and a ifft plan, should I create the stream and destroy the stream every loop count? Because only the result of fft and ifft during first time loop of in a thread is correct after a thread(fft and ifft are runing in this thread) is created start to run. In the consequency loop the fft and ifft result are all not correct.

Robert_Crovella · March 27, 2018, 3:34am

I wouldn’t.

Sounds like a bug in your code. It’s difficult to diagnose based on a 1-sentence description. Usually for people who are asking for help with code that is not working, I recommend posting a complete example. If I can’t copy, paste, compile, and run, and see the issue, I usually don’t even bother with it. If you want other people to help you, I suggest you make it as easy as possible for them to do so.

Topic		Replies	Views
Bad performance when using unified memory CUDA Programming and Performance	2	3417	April 21, 2019
cuFFT + streams CUDA Programming and Performance	8	5098	May 18, 2018
Unified memory and concurrent C++ objects Jetson TX2	10	2531	October 18, 2021
CUFFT gives wrong results? the results from MATLAB and CUFFT differ... CUDA Programming and Performance	5	9533	June 15, 2009
Can I use cufft in both main thread and a thread? CUDA Programming and Performance	1	3180	June 11, 2009
Using shared memory along with Unified Memory CUDA Programming and Performance	3	724	July 7, 2017
Implementing cuFFT with streams problem GPU-Accelerated Libraries cufft	3	885	October 12, 2021
cufftExecC2C and cudaMemcpyAsync Doing the FFT without 100% CPU usage CUDA Programming and Performance	6	27112	September 5, 2008
Jetson Nano cuFFT and streams Jetson Nano cuda	3	769	October 15, 2021
CPU usage at 99% while kernel is running CUDA Programming and Performance	5	4470	September 3, 2008

The calculation result is not correct after I use unified memory to transfter the data between gpu and cpu.

Related topics