The impact of cudaMalloc(）and cudaFree() on the overlapping of kernel executions and data transfer

xia.425 · July 22, 2020, 5:40am

Hi, I posted a question about the impact of cudaMalloc and cudaFree on the asynchronous execution.

According to cuda c programming guide, both cudaMalloc and cudaFree are synchronous. However, from my experiments, if I simply use cudaMalloc, then, it does affect the overlapping of kernel executions and data transfer. If we insert a corresponding cudaFree, then, it affects the asynchronous execution, it that correct?

Also, my situation is that I have a huge amount of data transfer from device to host, I want to make if overlapped with a complex function. ( There are cudaMalloc, cudaFree, kernel executions and data transfers) To achieve this, do I need to remove all cudaMalloc inside the function>? This means I need to pre-allocate the device memory. Also, because of the resource limits, I also need to remove data transfer from the device to the host. Is there any better solution or work-around?

Thanks a lot!

Topic		Replies	Views
Asynchronous problem with cudaMalloc CUDA Programming and Performance	2	977	May 22, 2023
accessing device memory during kernel execution CUDA Programming and Performance	3	1537	March 10, 2010
Nsys doesn't track cudaMallocAsync on Stream row CUDA Programming and Performance	4	42	November 25, 2024
the influence of cudaFree() on parallelism of cuda streams CUDA Programming and Performance	2	472	April 15, 2018
Combination of "Overlap of Data Transfer" and "Concurrent Kernel Execution" CUDA Programming and Performance	1	1314	September 14, 2011
cudaMallocHost increases kernel execution time CUDA Programming and Performance	3	932	February 23, 2018
Asynchronous kernel execution and memory not overlapping using CUDA stream! CUDA Programming and Performance	3	897	July 7, 2017
Streams & Malloc/Free CUDA Programming and Performance	2	538	July 10, 2015
cudaFree extremely slow CUDA Programming and Performance	15	2192	February 6, 2020
Are cudaMemCpy and cudaMalloc blocking/synchronous? CUDA Programming and Performance	1	554	September 30, 2024

The impact of cudaMalloc(）and cudaFree() on the overlapping of kernel executions and data transfer

Related topics