Hi,
I have a speed problem with cudaHostAlloc…
Basically my cuda routine (let’s call it John) is :
1 cudaSetDeviceFlags(cudaDeviceMapHost);
2 cudaHostAlloc((void**)&A, size,cudaHostAllocMapped));
3 cudaHostAlloc((void**)&B, size,cudaHostAllocMapped));
4 …calculations…kernels…
5 cudaFreeHost(A);
6 cudaFreeHost(B);
Execution time of 2 : 2 seconds
Execution time of 3 : 0.0001 seconds
Execution time of 1-2-3-4-5-6 : 10 seconds
Why is the first allocation so slow ?
I tried to call John twice from the main : the second call is fast : 0.0001 seconds execution time for both 2 and 3.
What’s happening during the first call to cudaHostAlloc ??
Thanks,
Nicolas