How to access the memory that is allocated using cudaMallocHost from cpu?

So I was wondering if I could access the cudamallochost’s memory from cpu, so I could get the result of the kernel straightly, instead of memcpy it, because it’s too slow. Also if I could access the cudamallochost memory, how can i put a pointer to it, so it would automatically update my all cpu value parallelism.

Yes, you can access it in the same fashion as memory allocated with new or malloc. The pointer set by cudaMallocHost can be used from either host or device code.

The CUDA sample code simpleZeroCopy gives a demonstration.