the influence of cudaFree() on parallelism of cuda streams

hulin · April 14, 2018, 12:01pm

the following pseudo code is used to process massive work using several streams(I omit some parameters in functions):

cudaStream_t stream[MAXSTREAM];
for(int i = 0; i < n; i ++)
{
    int *h_toDevice;
    cudaMallocHost((void **)&h_toDevive);
    memset(h_toDevive);

    int *d_toDevice;
    cudaMalloc(&d_toDevice);

    cudaMemcpyAsync(d_toDevice,h_toDevive,stream[i%MAXSTREAM]);

    kernelFunc<<<stream[i%MAXSTREAM]>>>();

    cudaFreeHost(h_toDevice);
    cudaFree(d_toDevice);
}

the above code does’t perform well as expected.But I find that if I delete the functions cudaFreeHost() and cudaFree(), the total time will decrease from 100 seconds to less than 10 second.(the MAXSTREAM is 10)
I doubt that the two functions have some influence on stream parallelism, so I set MAXSTREAM to 1, so there is no parallelism in the process. Then the time of both conditions become very close, they both use 100 seconds with or without the cudaFree and cudaFreeHost.
So do the two functions indeed have some influence on the parallelism of cuda stream? Could anyone help me? thx!

Robert_Crovella · April 14, 2018, 1:00pm

yes, they do. They are synchronizing. If you want to see the effect, run your code with the visual profiler.

It’s recommended that you avoid functions like cudaMalloc, cudaMallocHost, cudaFree, cudaFreeHost, in loops that are processing data in a time-critical way.

hulin · April 15, 2018, 4:14am

Thank you!

Topic		Replies	Views
cudaFree is slow CUDA Programming and Performance	5	2949	November 13, 2010
The impact of cudaMalloc(）and cudaFree() on the overlapping of kernel executions and data transfer CUDA Programming and Performance	0	1034	July 22, 2020
cudaFree painfully slow CUDA Programming and Performance	4	4696	January 29, 2010
Multi-threaded CPU application is not asynchronous when using cudaFree CUDA Programming and Performance	0	704	November 25, 2013
cudaFree extremely slow CUDA Programming and Performance	15	2408	February 6, 2020
Asynchronous problem with cudaMalloc CUDA Programming and Performance	2	1085	May 22, 2023
cudaFree in parallel with CUDA kernel CUDA Programming and Performance	1	4245	December 29, 2010
cudaFree time linearly depends on cublas call CUDA Programming and Performance	3	1132	March 26, 2013
Calling kernel in a loop spends much time in cudaFree CUDA Programming and Performance	1	823	July 16, 2018
cudaFree while kernel is executing CUDA Programming and Performance	1	9154	February 15, 2011

the influence of cudaFree() on parallelism of cuda streams

Related topics