"deallocating" shared memory

I am trying to minimize the required shared memory. Is there a way to let compiler know when allocated shared memory can be recalimed by another shared memory request?

For an example, within the same block, I have allocated 1 KB of shared memory. The shared memory is only used for the first half of the kernel. Once the shared memory is no longer used, if I allocate 0.5KB of shared memory again, then would it automatically reclaim the shared memory that was initially allocated? IF not, is there a manual way to let the compiler know that I am no longer going to use the region of shared memory, and it is safe to be reclaimed?

I would suggest to think about this in terms of re-using an existing allocation, rather in terms of allocating and re-allocating memory. It’s just like a malloc’ed piece of memory in any random piece of C++ code: different parts of the code can use such a buffer for any number of different purposes.

I could reuse it, but it is simply more convenient and easier to read if I can allocate / deallocate ( assuming the cost is negligible ). I assume you are suggesting to re-use because there is no such mechanism to do so?

I am suggesting re-use because I tend to think about code from a performance perspective:

(1) A lot of my career dealt with making things run fast (that’s something I am passionate about)
(2) The reason we all love GPUs/CUDA is because they allow us to make things run fast (right?)

From a performance perspective, memory allocation / deallocation / reallocation is typically BAD news. Something to avoid, if at all possible.

But you are right, dynamic shared memory is parceled out at kernel launch time, with each thread block receiving a pointer to its own portion, and there is no mechanism for re-allocating while the kernel is running.

Oh, I see. Thanks for the clarification.

You don’t really get to allocate/deallocate/reallocate shared memory inside a kernel with CUDA. There are two allocation methods:

static:

__shared__ <type> name;

dynamic:

extern __shared__ <type> name[];

Static allocation is done at compile time. Like any other immediate array in C/C++ (what you would have if you dropped the shared decorator) there is no way to deallocate it. (leaving C99 variable size arrays out of the discussion, but it does not change the fact that an immediate array declaration cannot be deallocated, reallocated, or have its size dynamically changed after declaration. Right? Or do you know something I don’t, language wise?)

Dynamic allocation is done at run time, but the allocation size is determined at kernel launch and thereafter cannot be changed in any fashion whatsoever.

Okay, thanks for the clarification.

I am somewhat surprised the union keyword has not yet been mentioned in this thread.

I don’t recall ever using a union for the purposes of re-using a previously allocated buffer. Where necessary, I have cast the pointer to the buffer to the appropriate type and proceeded from there. My use of unions has been restricted to instances of type punning (not typically necessary in CUDA device code thanks to re-interpreting intrinsics).