Any GPU with 2 or more copy engines (discoverable via deviceQuery) should support simultaneous H2D and D2H transfers.
I think there are recent GPUs in any category that support this. I’m not aware of a table or list anywhere. I also think if you search hard enough, you can find (for example) Quadro GPUs with only 1 copy engine.
I am not aware of any such list, maybe need to crowdsource one? Here is one more data point:
Device 0: "Quadro K420"
CUDA Driver Version / Runtime Version 10.2 / 9.2
CUDA Capability Major/Minor version number: 3.0
[...]
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Device 0: "Quadro RTX 4000"
CUDA Driver Version / Runtime Version 11.0 / 9.2
CUDA Capability Major/Minor version number: 7.5
[...]
Concurrent copy and kernel execution: Yes with 3 copy engine(s)
Device 1: "Quadro P2000"
CUDA Driver Version / Runtime Version 11.0 / 9.2
CUDA Capability Major/Minor version number: 6.1
[...]
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Hi,
Can anybody tell me if there has been some change in ampere architecture? My 3080 reports only 1 copy engine while the 2080 SUPER ( which is supposed to be inferior compared to 3080 ), reports 3 copy engines.
I have zero experience with Ampere-based GPUs. I would suggest filing a bug report with NVIDIA.
Maybe (speculation!) NVIDIA has changed the definition of “copy engine” with Ampere and the new GPUs sport a new all-singing-all-dancing copy engine with multiple channels. If that were the case, reporting that as one engine would not be helpful to programmers who try to assess whether simultaneous bi-directional DMA transfers are possible.
The result is fine on Ubuntu 18.04 with CUDA 11.1, on which 2 async engines are reported, which is understandable since my variant ( MSI Ventus 3X OC ) does not have any NVLink or SLI connectors.
great, I hope you included all that info (that it appears to be correct on linux) in the bug report. Thanks for filing the bug, especially since it seems like it might be a driver issue.