Querying amount of shared memory allocated

cudesnick · January 8, 2009, 7:29pm

Hello,

Background:

One can dynamically allocate shared memory on the device from host code at kernel launch time, by means of, e.g., providing the third argument to the kernel launch statement, as in:

myKernel<<<grid, threads, sizeOfSharedMemMyKernelExpects>>>(param1, param2, param3, param4);

where sizeOfSharedMemMyKernelExpects is the number I’m writing about.

This shared memory is available as an array, similar to

extern char sharedMem;

in the device code.

Question:

Is there a way to QUERY for sizeOfSharedMemMyKernelExpects parameter in the device code? I would expect smth. similar to how one can query for the number of threads (blockDim built-in), or some kind of a cuda function.

Specifically, my kernel requires a certain amount of dynamic shared memory (workspace) to run, depending on parameters it has been invoked with (param1, param2, … in the above example). This amount depends non-trivially on the parameters. I would like to incorporate an assert kind of code into the kernel, that complains when the amount of shared memory allocated dynamically is insufficient.

Thank you!

alex_dubinsky · January 8, 2009, 10:38pm

Unfortunately there isn’t (see Section 4.2.4).

You can get the same behavior (that is, without simply using a parameter) by creating a global constant variable, and calling cudaMemcpyToSymbol() right before your kernel call. See the end of Section 4.5.2.3

Sarnath · January 9, 2009, 3:32am

This is a very very valid question… However, I dont think there is any API to do this.

NVIDIA guys could look into it.

tmurray · January 9, 2009, 4:53am

Why? The only way I can think for us to implement it would to just pass a parameter as well and then hide it from the programmer, so it probably makes more sense to do it yourself.

Sarnath · January 9, 2009, 5:16am

oh yeah…Thats the simplest way… Dint strike me at all…

alex_dubinsky · January 9, 2009, 6:14am

You could make that exact argument for blockDim, gridDim, and warpSize.

tmurray · January 9, 2009, 7:27am

I could, but blockDim/gridDim are fundamental to any algorithm. Dynamic shmem allocation isn’t and in a lot of cases wouldn’t be doing anything.

cudesnick · January 9, 2009, 9:09pm

Alex, thank you for a very detailed answer!

Frankly, I don’t quite understand tmurray’s argument. Probably, this is because I’m not very familiar with Cuda’s internals.

I would imagine, that the host code needs to communicate to the device the amount of the shared memory to be dynamically allocated at kernel launch. This amount is zero in most cases (according to tmurray), but still…

If the above statement is true, then I don’t see why Cuda can’t expose this value through a device-space function. I don’t think this function needs to be extremely efficient: if the function retrieves the relevant variable from constant memory (as Alex suggested earlier in this thread), it would probably be sufficiently fast.

Thanks again for your input!

tmurray · January 9, 2009, 9:20pm

Again, why? If it’s 0 in 95% of cases (in my experience, this is the case), there would be additional latency on kernel launches from copying that one parameter into constant memory. In other words, there would be a performance hit from this when it does nothing most of the time. Why is this a good thing?

Topic		Replies	Views
[SOLVED] Shared memory variable declaration CUDA Programming and Performance	3	15031	December 23, 2016
Confirm that dynamically allocated __shared__ memory is just as fast as the statically allocated variety? CUDA Programming and Performance	0	278	July 15, 2022
Question about using shared memory CUDA Programming and Performance	1	5017	September 10, 2009
Need for dynamic allocated shared memory? CUDA Programming and Performance	2	3527	March 4, 2011
allocatable size of shared memory CUDA Programming and Performance	4	8963	March 13, 2007
How can I configure this problem is it too big to fit in shared memory? CUDA Programming and Performance	7	3717	October 14, 2008
How can one determine the temp storage size for CUB block primitives from the host? CUDA Programming and Performance	1	894	March 2, 2020
shared memory and CUDA calculator CUDA Programming and Performance	6	4033	October 26, 2008
Newbie question: Shared memory CUDA Programming and Performance	7	2714	July 12, 2008
PTXAS info ...+16 bytes CUDA Programming and Performance	15	9266	November 13, 2010

Querying amount of shared memory allocated

Background:

Question:

Related topics