About Async(Copy) Engines

zetpop10 · October 2, 2025, 9:55pm

I want to know the number of asynchronous engines of the graphics cards I’m asking about.
And if the number of asynchronous engines is 1 instead of 2 or more, I’d like to know how to perform HtoD / DtoH simultaneously.
Additionally, if the original number of asynchronous engines of these graphics cards is 2 or more, but cudaGetDeviceProperties(&deviceProp, deviceI) returns 1, I’d like to know how to restore the original number of asynchronous engines.

Here is the list of graphics cards I own:

RTX 3090 Ti
RTX 3070
RTX 3080
RTX 5090

Greg · October 3, 2025, 4:50am

Consumer graphics card historically only enable 1 copy engine.
If there was a driver bug you may be able to see more copy engines if you revert to a much older driver. There is no guarantee that multiple copy engines were usable.

A single copy engine cannot simultaneously support H2D and D2H.
Small H2D copies generally do not use the copy engine.
If the host memory allocation is using pinned system memory a copy kernel is an efficient way to achieve parallel copies.

zetpop10 · October 7, 2025, 10:55am

Thank you for your answer. I have a couple of follow-up questions.

Is there any official documentation that specifies the number of copy engines for specific GPU models? I couldn’t find this information in the Blackwell architecture diagrams.
If so, does that mean I need to use a professional-grade GPU, like the NVIDIA RTX 6000, to utilize two or more asynchronous copy engines for simultaneous HtoD and DtoH transfers? I’m looking for the most cost-effective option with at least two copy engines.

Topic		Replies	Views
CUDA concurrent D2H, H2D icrio CUDA Programming and Performance	10	1667	November 5, 2020
Number of Copy Engines on NVIDIA GPU products GPU - Hardware	1	623	December 5, 2024
Three concurrent cudaMemcpyAsync(HtoD) calls still serialized on H100/H20 even when asyncEngineCount == 3 CUDA-GDB cuda	1	78	April 2, 2026
# of CUDA copy engines on GeForce RTX 3060 CUDA Programming and Performance	2	247	June 5, 2025
GTX 680 and GTX 780 with only one copy engine? CUDA Programming and Performance	4	1357	March 3, 2014
CUDA: combining H2D and D2H memory transfer operations CUDA Programming and Performance	7	3912	March 1, 2015
cudaMemcpyAsync HtoD and DtoH blocking each other CUDA Programming and Performance	4	641	April 25, 2024
ORIN AGX Copy Engine Jetson AGX Orin cuda	5	833	June 18, 2024
concurrency among copies: is it possible? CUDA Programming and Performance	5	2833	December 7, 2012
No. of copy engines CUDA Programming and Performance	3	233	August 12, 2025

About Async(Copy) Engines

Related topics