Hello,
I tested the dual copy engines using cudamemcpyasnyc() on GTX 1070 with CUDA 8.0
But, Only 1 copy engine worked. When I excuted DeviceQuary.exe in CUDA Samples, I confirmed that GTX 1070 has the Dual Copy Engine.
It is my test code
for(int i=0; i<iNumStream; i++)
{
int iOffset = i * iStreamSize;
cudaMemcpyAsync(&device_R[iOffset], &host_R[iOffset], iStreamSize, cudaMemcpyHostToDevice, streams[i]);
Kernel<<<dimGrid, dimBlock, 0, streams[i]>>>(device_R, device_Out, iOffset);
cudaMemcpyAsync(&host_Out[iOffset], &device_Out[iOffset], iStreamSize, cudaMemcpyDeviceToHost, streams[i]);
}
Please Help me!!