"deallocating" shared memory

isaaclee2313 · December 4, 2018, 3:04am

I am trying to minimize the required shared memory. Is there a way to let compiler know when allocated shared memory can be recalimed by another shared memory request?

For an example, within the same block, I have allocated 1 KB of shared memory. The shared memory is only used for the first half of the kernel. Once the shared memory is no longer used, if I allocate 0.5KB of shared memory again, then would it automatically reclaim the shared memory that was initially allocated? IF not, is there a manual way to let the compiler know that I am no longer going to use the region of shared memory, and it is safe to be reclaimed?

njuffa · December 4, 2018, 3:23am

I would suggest to think about this in terms of re-using an existing allocation, rather in terms of allocating and re-allocating memory. It’s just like a malloc’ed piece of memory in any random piece of C++ code: different parts of the code can use such a buffer for any number of different purposes.

isaaclee2313 · December 4, 2018, 3:26am

I could reuse it, but it is simply more convenient and easier to read if I can allocate / deallocate ( assuming the cost is negligible ). I assume you are suggesting to re-use because there is no such mechanism to do so?

njuffa · December 4, 2018, 3:38am

I am suggesting re-use because I tend to think about code from a performance perspective:

(1) A lot of my career dealt with making things run fast (that’s something I am passionate about)
(2) The reason we all love GPUs/CUDA is because they allow us to make things run fast (right?)

From a performance perspective, memory allocation / deallocation / reallocation is typically BAD news. Something to avoid, if at all possible.

But you are right, dynamic shared memory is parceled out at kernel launch time, with each thread block receiving a pointer to its own portion, and there is no mechanism for re-allocating while the kernel is running.

isaaclee2313 · December 4, 2018, 3:43am

Oh, I see. Thanks for the clarification.

Robert_Crovella · December 4, 2018, 3:48am

You don’t really get to allocate/deallocate/reallocate shared memory inside a kernel with CUDA. There are two allocation methods:

static:

__shared__ <type> name;

dynamic:

extern __shared__ <type> name[];

Static allocation is done at compile time. Like any other immediate array in C/C++ (what you would have if you dropped the shared decorator) there is no way to deallocate it. (leaving C99 variable size arrays out of the discussion, but it does not change the fact that an immediate array declaration cannot be deallocated, reallocated, or have its size dynamically changed after declaration. Right? Or do you know something I don’t, language wise?)

Dynamic allocation is done at run time, but the allocation size is determined at kernel launch and thereafter cannot be changed in any fashion whatsoever.

isaaclee2313 · December 4, 2018, 4:07am

Okay, thanks for the clarification.

tera · December 4, 2018, 9:25pm

I am somewhat surprised the union keyword has not yet been mentioned in this thread.

njuffa · December 4, 2018, 9:36pm

I don’t recall ever using a union for the purposes of re-using a previously allocated buffer. Where necessary, I have cast the pointer to the buffer to the appropriate type and proceeded from there. My use of unions has been restricted to instances of type punning (not typically necessary in CUDA device code thanks to re-interpreting intrinsics).

Topic		Replies	Views
Memory Deallocation CUDA Programming and Performance	4	6621	September 1, 2008
Dynamically allocating memory inside __device/global__ CUDA kernel CUDA Programming and Performance	3	1333	August 11, 2013
Reallocating memory on CUDA? CUDA Programming and Performance	2	4817	January 17, 2011
how to dynamically allocate shared memory CUDA Programming and Performance	1	4013	June 26, 2009
memory allocation question CUDA Programming and Performance	6	4148	April 29, 2011
Shared memory - dynamic allocation CUDA Programming and Performance	3	1226	November 28, 2017
How to dynamically allocate shared memory? in _global__ or __device__ functions CUDA Programming and Performance	8	27247	October 7, 2010
Need for dynamic allocated shared memory? CUDA Programming and Performance	2	3527	March 4, 2011
Is the time of allocating the shared memory dependent on the shared memory size? CUDA Programming and Performance	2	633	March 24, 2013
Where best to allocate memory On the local stack or in shared memory CUDA Programming and Performance	11	5421	January 26, 2009

"deallocating" shared memory

Related topics