Async Memcpy calls blocking main thread

boogagiga · November 17, 2011, 10:06pm

i7-975
12 GB RAM
1.5 GB gtx580
win7 64-bit
using VS 2010, CUDA 4.0

Ok, so I’m having an issue I couldnt find in this forum which is that it seems that my cudaMemcpyAsync calls are not actually asynchronous.

For example:

cpu_array[0] = -1;

cudaMemcpyAsync(cpu_array,gpu_array,n*sizeof(float),cudaMemcpyDeviceToHost,stream); //copies into cpu_array with data values that are always positive

if (cpu_array[0] != -1){
//it always ends up here where cpu_array[0] equals the value that was supposed to be copied.
}

If the call was asynchronous then the conditional statement should evaluate to false. Considering I set cpu_array[0] to -1 right before the memcpy call, it should be cached, which means that it is highly unlikely for the conditional statement to evaluate after the gpu copied anything to RAM. At least that’s what I think, correct me if Im wrong.

Also, I have another issue (which I resolved by moving to floats, which is an acceptable change for what I’m working on) that I started a post on here:

It’s starting to seem like it’s compiling with a low compute capability, but Ive made sure in the settings that I set compute_20 and sm_20.

Is there any way to print out the CC at runtime? What could possibly be the problem here?

Thanks in advance

tmurray · November 19, 2011, 12:05am

the PCIe transfer will be snooped and invalidate your cacheline as soon as that copy appears

boogagiga · November 19, 2011, 3:14am

Thanks for the reply.

I’m not sure I understand what youre saying. It seems like youre saying that the CPU will go to main memory as opposed to cache because of the fact that main memory has been changed? But I’m not sure if that’s what you mean. And what exactly does snooped mean? Sorry for being a newb lol.

Anyway, assuming that that is what you meant by your statement, how can the main memory be accessed and changed so quickly? I figured that by the time the GPU started even thinking about transferring to main memory, the CPU would have already completed the conditional statement.

Either way, I just want to make sure that my memcpy calls are asynchronous. Are you trying to say that they are? I’ve even compared timings of kernel calls (which are always asynchronous) to my memcpyasync calls, and my memcpyasync call was taking 4-5x the amount of time it takes to call four kernels.

boogagiga · November 19, 2011, 3:26am

Ok well I just tried increasing the amount of memory that was being transferred in my calls and checking again the difference in time it took to call the 4 kernels vs the one memcpyasync call, and the difference was the same. Even when I increased it to 20 times the original amount of memory being transferred… So I guess it is asynchronous? I’m confused…

Topic		Replies	Views
cudaMemcpyAsync not behaving asynchronously CUDA Programming and Performance	5	2514	July 4, 2008
cudaMemcpyAsync not giving any answers using cudaMemcpyAsync function CUDA Programming and Performance	1	831	September 5, 2011
Overlap cudaMemcpyAsync with CPU execution CUDA Programming and Performance	2	1167	April 3, 2009
question on asyncAPI.cu CUDA Programming and Performance	1	646	February 12, 2011
Questions about "cudaMemcpyAsync" Legacy PGI Compilers	1	2394	November 18, 2011
cudaMemcpyAsync not "async" in cuda 3.1 cudaMemcpyAsync blocking cuda 3.1 CUDA Programming and Performance	7	2024	July 12, 2010
cudaMemcpyAsync code problem CUDA Programming and Performance	3	4609	September 16, 2008
Asynchronous memory copy from Host to Device CUDA Programming and Performance	5	3127	June 12, 2008
Execution mode question: asynchronous or synchronous CUDA Programming and Performance	4	1418	January 26, 2011
cudaMemcpyAsync problem CUDA Programming and Performance	9	3263	May 26, 2020

Async Memcpy calls blocking main thread

Related topics