Could you guys please bring back cudaMapAddress? a few words to cuda developer

The function “cudaMapAddress” is available in prior version of CUDA, but missing in 0.8BETA version. This function is very important for my task, could you please consider including it in CUDA?

Hers is my situation. As a co-processor to CPU, I wish GPU would bring as little overhead as possible to CPU.

While using cudacpymem will slow down CPU a lot, the cudaMapAddress is a good function for this. To my understanding, it establishes a link between specific part of CPU and GPU memory. if I write to that part of GPU memory, this writing operation is redirected to CPU memory. This is really great, since CPU can only read that part of memory to decide whether GPU has found something important. Since GPU won’t alway find useful information, there won’t be a lot of memory writing from GPU to CPU.

Sorry to bring this topic since similar one is here:

Are you trying to have CPU read some data written by the GPU while the CUDA kernel is still running? If so, I’m not sure you can do that reliably since there’s no way for the CPU to know where the GPU is in its computation.

If you are trying to find out whether GPU has come up with “useful” data after the GPU kernel has completed, why not try this:

  1. have the CUDA kernel write a “flag” to global memory indicating whether the computation lead to “useful” data.
  2. use memCopy to read the flag, if it indicates that computation was “useful,” then do a large memCopy of the data computed by the GPU.

You have to realize that whatever method you use to get the data to the CPU (whether its mapping or copying), the data still has to travel across the bus from the device to the host memory. If that host memory area is cached, then CPU has to get involved at one point or another, to avoid outdated cache lines. So, I’m not sure you would really get improved performance with mapping. Is there a slowdown in your app when you change it to use memcopies?


I test the speed of cudamemcopy and cudamapaddress in the previous version. cudamemcopy is much slower than cudamapaddress.