GTX 680 and GTX 780 with only one copy engine?

I’m starting to experiment with CUDA streams and asynchronous data transfers, but I’m not getting the expected results on two systems I’m using, one with a GTX 680 (drivers v319.32, Ubuntu 13.10 64bit) and one with a GTX 780 (drivers v319.37, Ubuntu 10.04.4 64bit). In particular I’m not able to overlap host-to-device transfers with device-to-host transfers, in fact the asyncEngineCount property is equal to 1. Even the example from https://devblogs.nvidia.com/parallelforall/how-overlap-data-transfers-cuda-cc/ behaves as I’m using a C1060 with only one copy engine. Is this possible? Do these two cards have only one copy engine or the drivers are too old?

Thanks in advance

The GTX 680 and 780 both have only one engine as you’ve found. Two DMA engines enable simultaneous asynchronous memory copies. That’s a feature of the professional Tesla GPUs, not the consumer ones.

Ok, so I guess there’s nothing I can do for this… thanks for the reply

Just checking you know that your cards can overlap kernels with transfers. This is what your common or garden variety CUDA-elf means when they talk about “using streams for asynchronous transfers”.

Yes, this is what I’m achieving right now:

I guess it’s the best with just one copy engine