Difference between cudaMemcpy and cudaMemcpyAsync in a thread context

cilas · August 20, 2025, 7:45pm

Hi everyone! I implemented code where I read several images in parallel using threads and copy the image data to the GPU with cudaMemcpyAsync, according to the following pseudocode:

        band_threads.emplace_back([&, i]() {
           device_band[i] = read_device_band(); 
           HANDLE_ERROR(cudaMemcpyAsync(device_bands[i], host_bands[i], band_bytes, cudaMemcpyHostToDevice, streams[i]));
         }

        for thread in band_threads {
           thread.start()
        }

Is there any benefit to using cudaMemcpyAsync? Or is it the same as using cudaMemcpy?

Topic		Replies	Views
cudaMemcpyAsync slower than cudaMemcpy? CUDA Programming and Performance	1	3140	March 10, 2009
cudaMemcpyAsync problem CUDA Programming and Performance	9	3339	May 26, 2020
cudaMemcpyAsync() cost time is same with cudaMemcpy() CUDA Programming and Performance	1	635	November 16, 2018
cudaMemcpyAsync CUDA Programming and Performance	1	4905	December 8, 2008
cudaMemcpyAsync makes code faster even when using the default stream 0 CUDA Programming and Performance	1	1663	January 10, 2022
Cudamemcpy vs cudamemcpyasync in different cpu threads with different data and pointers Jetson AGX Orin cuda	2	113	December 4, 2024
Confusion about synchronization or asynchronization of cudaMemcpy() and cudaMemcpyAsync() CUDA Programming and Performance	5	5179	December 23, 2023
CPU blocked MUCH longer than expected calling a cudaMemcpy after a cuda graph launch CUDA Programming and Performance	7	723	October 19, 2023
Questions about "cudaMemcpyAsync" Legacy PGI Compilers	1	2410	November 18, 2011
Overlap cudaMemcpyAsync with CPU execution CUDA Programming and Performance	2	1195	April 3, 2009

Difference between cudaMemcpy and cudaMemcpyAsync in a thread context

Related topics