CudaMalloc is too expensive and GPU Memories

KapilMehta · January 20, 2016, 1:23pm

Dear All,

I am trying to convert some part of C++ code into CUDA Kernels. Till now i have got following queries … any inputs on these are highly appreciated

For executing any portion of C++ code on GPU using CUDA i need to CudaMalloc which takes alot of time which is actually more than complete execution time on CPU, then what is appropriate way to allocate memory on GPU ?
What different types of memories are available on GPU which is appropriate to Use with CUDA ?

njuffa · January 20, 2016, 3:14pm

What is the “complete execution time on CPU”? How did you determine that “CudaMalloc … takes a lot of time”?

Note that CUDA context creation / initialization occurs lazily, usually triggered by the first CUDA API call, such as the first cudaMalloc() call. CUDA context initialization varies based on system configuration, and generally takes longer if there is a lot of system memory and a lot of GPU memory.

KapilMehta · January 21, 2016, 1:08pm

Absolutely Correct …

This is what i come to know CUDA subsystem gets initialized by the first CUDA runtime API call.

To prevent this it’s mentioned in user guide that we should use CUT_DEVICE_INIT which will do required initialization and after that we should use Cudamalloc and after this i have verified cudamalloc takes only 10 micro seconds…

But currently i am struggling to add CUT_DEVICE_INIT with Cuda 6.5 version… forums say CUT_DEVICE_INIT is removed after Cuda 5.0 so what is the alternate way to initialize cuda subsystem with Cuda6.5 ??

Any input is highly appreciated…

njuffa · January 21, 2016, 3:05pm

Which “user guide” recommended the use of CUT_DEVICE_INIT? Can you point to the relevant document and the relevant section in that document?

As far as I recall, all the CUT stuff was part of a utility library that was introduced to shorten example program shipping with CUDA, and NVIDIA pointed out numerous time that this code was not to be considered part of the CUDA deliverables, could change or go away at any time, and should therefore not be used by CUDA programmers for production code.

Try calling cudaFree(0) to trigger initialization of the CUDA context.

KapilMehta · January 21, 2016, 3:39pm

Correct… This was library called CUTIL and it’s not part of CUDA toolkit…

Thanks for your response cudaFree(0) is initilizing CUDA subsystem successfully .

KapilMehta · January 22, 2016, 5:04am

cudaFree Frees the memory space pointed to by devPtr, which must have been returned by a previous call to cudaMalloc() and if argument is zero as you suggested it does no operation…

It’s solving issue of initializing CUDA subsystem but CudaFree doesn’t seem to be an API for this purpose so is there any other API for Cuda6.5 which can be used at the start of the application to initialize CUDA subsystem…

njuffa · January 22, 2016, 6:49am

In the CUDA runtime API, there is no dedicated context creation API call. Instead, context creation and initialization happen lazily, as needed. This is by design, as the CUDA runtime seeks to hide low-level details that are exposed when using the CUDA driver API.

Most CUDA runtime API calls will trigger the context creation if a context doesn’t exist yet. Calling cudaFree(0) is one API call that is convenient to manually trigger creation of the context, as it initiates no other activity besides the side effect of kicking off context creation and initialization.

If, for reasons I do not understand, you do not want to use cudaFree(0) to trigger the CUDA context creation, you are free to invoke some other suitable CUDA runtime API call for this purpose, or you can simply rely on the default lazy on-demand initialization.

Topic		Replies	Views
CudaMalloc is taking huge time for first time, How to overcome this issue CUDA Programming and Performance cuda	1	1059	April 12, 2021
cudaMalloc execution time CUDA Programming and Performance	2	46	December 16, 2024
Is there any possibility to create constexpr CUDA resource allocation? CUDA Programming and Performance	3	26	October 17, 2024
CUDA setup times (create context, malloc, destroy context) some measurements included CUDA Programming and Performance	19	23177	July 8, 2011
First cudaMalloc() takes long time? CUDA Programming and Performance	13	17173	April 23, 2021
cuda startup slow CUDA Programming and Performance	4	8397	March 6, 2009
cudaMalloc takes several seconds CUDA Programming and Performance	6	2510	August 13, 2013
free kernel code after execution CUDA Programming and Performance	8	4798	June 23, 2012
cudamalloc slow CUDA Programming and Performance	5	8379	November 13, 2015
cudaMalloc problems CUDA Programming and Performance	3	2264	April 24, 2008

CudaMalloc is too expensive and GPU Memories

Related topics