Hi Nvidia
first, I don’t know why the CudaMallocHost array had a lot of delay to access.
in example, if i define array as “float array[100] = {0,};” is pretty fast but
if i define array as cudaMallocHost((float**)&array, 100*sizeof(float)); it takes a lot of time to access array
is this right ? or am i missing something ?
second, is there have any way to use memory copy asynchronous without cudaMallocHost array ?
if i define array without cudaMallocHost then “cudaMemcpyAsync” is not work. is this right ? or am i missing ?
Thanks.