So I was wondering if I could access the cudamallochost’s memory from cpu, so I could get the result of the kernel straightly, instead of memcpy it, because it’s too slow. Also if I could access the cudamallochost memory, how can i put a pointer to it, so it would automatically update my all cpu value parallelism.
Yes, you can access it in the same fashion as memory allocated with
malloc. The pointer set by
cudaMallocHost can be used from either host or device code.
The CUDA sample code simpleZeroCopy gives a demonstration.