So I was wondering if I could access the cudamallochost’s memory from cpu, so I could get the result of the kernel straightly, instead of memcpy it, because it’s too slow. Also if I could access the cudamallochost memory, how can i put a pointer to it, so it would automatically update my all cpu value parallelism.
Yes, you can access it in the same fashion as memory allocated with new
or malloc
. The pointer set by cudaMallocHost
can be used from either host or device code.
The CUDA sample code simpleZeroCopy gives a demonstration.