any way to write to CPU memory in kernel? why function cudaMapAddress is missing ?

It seems there is a function “cudaMapAddress” in the prior version of CUDA, but I didn’t see it in the new version(0.8 Beta).

There are two questions I want to ask :

1)To my understanding, that function enables writing from kernel to the main memory, is that correct?

2)why it disappears in the 0.8 Beta version? there will be similar function in future version ?

any hints ?

cudaMapAddress() has been removed from the API because it is slow and uses the bus very inefficiently. Memory copying (as opposed to memory mapping) is the best way to read from or write to device memory.

To answer your first point below: cudaMapAddress() didn’t enable writing to main memory from a kernel; you cannot call cudaMapAddress() from a kernel or any function from the host runtime component for that matter (see section 4.5 of the programming guide).

Although I cannot call cudaMapAddress() from a kernel, it seems if you write to an address in a kernel which has been mapped to some address in the CPU memory, this writing may be redirected to the CPU memory.

Here is another question, CUDA doesn’t support writing to CPU memory in the kernel? this ability should be very important in combing the CPU and GPU, especially when using GPU as parallel co-processor.

As Cyril pointed out above, you have the cudaMemcpy functions instead. Indeed, I prefer the copy approach to the mapping approach as with the copy you have less concurrent memory accesses to worry about, ie. you are sure not to have race conditions with CPU threads accessing memory that is currently mapped. Else the synchronization between the CUDA threads would need an extension to also lock CPU access, which will be a very slow implementation.


Thanks for the answer.

But, cudaMemcpy is inappropriate in my situation. I use GPU as co-processor to the CPU.

Initially, CPU assigns a computational task to GPU, later CPU wants to check whether there is some interesting results obtained by GPU. I hope there is a very cheap way to detect the status of data on GPU. in OpenGL it is occlusion query which is pipelined operation. but in CUDA, I can only use cudaMemcpy, which,I presume,will flush the GPU pipeline and slow down the CPU.

Is there any other way to query the data on GPU ?


I think you misunderstand how CUDA works. There is no such thing as a display context so there is no asynchronous command submission (currently). In other words, your CUDA kernel call will block until completed.

So if you want parallel work to be done on CPU and GPU, you need a separate (CPU) thread for CUDA anyway. It can download a small status array at the right moment and place it in CPU shared memory for the other CPU threads to read. So this should be perfectly parallel.

Be warned however if you run CUDA and rendering on the same card. See other discussions in this forum why.