why cuda API does't support memory copy parallel with GPU computing

first question: From my testing, i can see that when GPU deals with a kernel, it does’t support data transfer from CPU at the same time.
why cuda API (1.1) does’t support memory copy parallel with GPU computing?

second question: GPU does’t support data transfer from cpu to GPU parallell with data transfer from GPU to cpu, but i think DMA should support bidirectional data transmission.
does it hardware not support or API not support?

any one can give me explanation?
thank you in advance!

That is not true. There is an example in the SDK. devicequery tells you if your device supports it (not all devices do)

yes, i know that.

but someone said it is a bug for 1.1 (for G80 series hardware ). you can see http://forums.nvidia.com/lofiversion/index.php?t55372.html

so i am very confused on this problem.

G80 is 1.0 hardware, so it does not support async memory copy operations. The fact that it’s reported as supported is a bug. However, for 1.1 (and later) hardwares (including G8X other than G80 and G92) it’s supported.

thank you, i got it.

my GPU is 9600 GT, core is G94. below is the simplestream run results:

memcopy: 33.51

kernel: 40.80

non-streamed: 74.86 (74.31 expected)

8 streams: 75.12 (44.99 expected with compute capability 1.1 or later)


does it not support overlaping?

according to my testing, it indeed does’t support overlapping. the same code run on GTX280 (with capability 1.3), its result indicates the parallel very good.

Can you tell us the versions of your CUDA toolkit and GPU driver? (and of your gtx280 machine)

For Geforce 9600 GT, toolkit version is 1.1 and the driver version is 178.15

For GTX 280, the driver version is also 178.15, toolkit is same as above.

is there any problem?

thank you