CudaMallocHost array access latency

Hi Nvidia

first, I don’t know why the CudaMallocHost array had a lot of delay to access.
in example, if i define array as “float array[100] = {0,};” is pretty fast but
if i define array as cudaMallocHost((float**)&array, 100*sizeof(float)); it takes a lot of time to access array
is this right ? or am i missing something ?

second, is there have any way to use memory copy asynchronous without cudaMallocHost array ?
if i define array without cudaMallocHost then “cudaMemcpyAsync” is not work. is this right ? or am i missing ?

Thanks.

Yes, cudaMallocHost affects CPU caching behavior and may affect performance of access to the host data.

cudaMemcpyAsync depends on pinned memory, for asynchronous copy of data.

Yes, its expected that if you define the array without cudaMallocHost (or something that pins it) that you will not witness asynchronous behavior.