I’m starting to experiment with CUDA streams and asynchronous data transfers, but I’m not getting the expected results on two systems I’m using, one with a GTX 680 (drivers v319.32, Ubuntu 13.10 64bit) and one with a GTX 780 (drivers v319.37, Ubuntu 10.04.4 64bit). In particular I’m not able to overlap host-to-device transfers with device-to-host transfers, in fact the asyncEngineCount property is equal to 1. Even the example from https://devblogs.nvidia.com/parallelforall/how-overlap-data-transfers-cuda-cc/ behaves as I’m using a C1060 with only one copy engine. Is this possible? Do these two cards have only one copy engine or the drivers are too old?
Thanks in advance