cudaMalloc execution time

a.j · December 16, 2024, 4:16am

When allocating a certain size of GPU memory using cudaMalloc, the execution time is approximately 200 ms.
This is only the first time; from the second time onwards, if the same size of memory is allocated in the same way, the execution time is around 10 ms.

cudaMalloc((void**)&a, (width * height * sizeof(int)));

Why is there such a large difference? Is there any way to shorten the time required for the initial memory allocation?

njuffa · December 16, 2024, 6:50am

If the first cudaMalloc()is the first CUDA API call overall in your application, it will also trigger initialization of a CUDA context, which can be a fairly costly operation (much of this time is spent mapping all GPU and system memory into a single unified virtual address map).

The classical trick to trigger CUDA context initialization at a point that is more convenient is to issue a cudaFree(0). No idea whether this still works, but worth a try.

CUDA initialization time can also be influenced by module loading time. To minimize the upfront time expenditure for this at initialization time and defer it to the point of use, you would want to set the environment variable CUDA_MODULE_LOADING=LAZY (this may already be the default depending on platform).

Generally speaking, the time for CUDA context initialization and calls to cudaMalloc() correlates strongly with the single thread performance of the CPU of the host system (with system memory performance a weak secondary factor). High single thread performance in CPUs in turn correlates strongly with CPU clock frequency. For this reason I recommend using CPUs with a base frequency >= 3.5 GHz. Nowadays, CPUs with up to physical 48 cores that satisfy this criterion are available.

striker159 · December 16, 2024, 8:47am

It should still work. However, since CUDA 12 the documented way to trigger context initialization is calling cudaSetDevice.

Topic		Replies	Views
cudaMalloc's taking different times CUDA Programming and Performance	3	1979	December 22, 2010
CudaMalloc is taking huge time for first time, How to overcome this issue CUDA Programming and Performance cuda	1	1143	April 12, 2021
First cudaMalloc() takes long time? CUDA Programming and Performance	13	17438	April 23, 2021
cudaMalloc takes several seconds CUDA Programming and Performance	6	2618	August 13, 2013
Memory Allocation Time Takes too much time!! CUDA Programming and Performance	3	4667	August 28, 2009
Calculate time ? CUDA Programming and Performance	5	2891	November 23, 2008
cudaMalloc taking 4 seconds CUDA Programming and Performance	4	872	November 23, 2011
cuda startup slow CUDA Programming and Performance	4	8455	March 6, 2009
Is cudaMalloc slow when called multiple times? CUDA Programming and Performance	3	234	July 5, 2024
Is first cudaMalloc() will take more time? then how much? CUDA Programming and Performance	1	2963	April 15, 2009

cudaMalloc execution time

Related topics