Asynchronous memory copy from Host to Device


My application requires constantly streaming data from a live camera feed. I’m therefore really interested in the asynchronous memory copy from host to device.

Are there any nVidia graphics cards currently out that support this capability? I know that cudaMemcpyAsync is supported in the new 2.0 SDK. However, I’ve heard from some colleagues that this will not actually run on current hardware.

In works on all compute 1.1 hardware, which is all CUDA capable hardware that is not G80 (8800 GTX, Tesla).

I have Async memcpys working on an 8800GTX, Compute Cap 1.0.

Streams do not work on the 8800 GTX, but async kernel calls and async memcpys do.

Yes, you can do async memcpys. But you can’t overlap them with kernel executions on 1.0 hardware which I assumed is what the OP was asking about.

I use async streams with 2x GF8500GT… but it’s curious… the cudaCreateStream() always returns a stream handle of “1”… even with multiple threads and GPUs plugged.

Thanks for all the help so far.

So let me try to see if I understand:

Compute 1.0 capability: Kernel will not execute while async memcopy is working

Compute 1.1 capability: Kernel WILL execute while async memcopy is working

Do I have this right?