cudaHostAlloc memory initial time

dan.liu · August 19, 2018, 10:37am

Hello everyone! I know stream and asynchronous commands can achieve better performance. I run my code on TX2,and the code sample is like below.The problem is the most time is wasted on step3. when i use cudaMemcpy directly the data copy time is 20~40ms, but now use cudaMemcpyAsync, step3 is almost 40~60ms.what can i do to solve this problem?

cudaHostAlloc((void **)&h_a, sizeof(float)*N, cudaHostAllocDefault);
cudaMalloc((void **)&d_a, sizeof(float)*N);
/**step3**/
for(int i=0; i<N; i++)
{
        h_a[i] = src_a[i];
}
/**step3**/

cudaMemcpyAsync(d_a, h_a, sizeof(float)*N/nstreams, cudaMemcpyHostToDevice, stream[i]);

Topic		Replies	Views
About CUDA CUDA Programming and Performance	2	4729	December 3, 2008
cudaMemcpyAsync slower than cudaMemcpy? CUDA Programming and Performance	1	3101	March 10, 2009
Questions about "cudaMemcpyAsync" Legacy PGI Compilers	1	2372	November 18, 2011
cudaMemcpyAsync code problem CUDA Programming and Performance	3	4576	September 16, 2008
cudaMemcpyAsync() cost time is same with cudaMemcpy() CUDA Programming and Performance	1	597	November 16, 2018
cudaMemcpy host->device and device->host speed CUDA Programming and Performance	6	15366	April 29, 2014
cudaMemcpyAsync not behaving asynchronously CUDA Programming and Performance	5	2475	July 4, 2008
Much slower async memcpy in a separate stream than in stream 0 CUDA Programming and Performance	4	5220	July 23, 2015
Odd cudaMemcpyAsync() behavior with Kepler K20c and CUDA 5.0 CUDA Programming and Performance	0	945	January 14, 2013
cudaMemcpy(dataDev, dataHost, mem_size, cudaMemcpyHostToDevice) execution time to long CUDA Programming and Performance	2	6424	January 21, 2010

cudaHostAlloc memory initial time

Related topics