I need to use cudaMemcpyAsync to copy a buffer from the device to the host. However the host buffer I am provided with was allocated with malloc and not cudaHostAlloc. Will my code still provide the correct result? From my tests so far it appears to provide the correct result but I just want to double check that this is guaranteed across platforms. I realize that since I am using malloc the copy operation will be serialized, but at this point I am just interested in correctness.