Execution mode question: asynchronous or synchronous

  1. Most lectures say: cudaMemcpy() is synchronous. This is what I understand.

  2. But when I read “CUDA C Programming Guide Version 3.2”, got really confused by $3.2.7.1:

Asynchronous Concurrent Execution:
“Host <-> device memory copies of a memory block of 64 KB or less”.

Do these two conflict? Thanks.

  1. is an exception to 1). It has only recently been documented by Nvidia, so most lectures predate this.

Actually I think only host->device copies of 64kB or less are asynchronous, as asynchronous device->host copy would violate the API and make a lot of programs fail that would need extra synchronization.

I was worried about cudaMemcpy. I use a lot, if it’s not synchronous (device to host), my program will crash :-)

Thanks.

To ease your worries, have a look at this thread.

Thank you so much, tera :-)