How to dynamically allocate shared memory? in _global or device__ functions

humorstar · October 23, 2007, 4:54am

Hello,

Could someone please tell me how to allocate shared memory in a global or device functions?

I noticed that we can allocate a block of memory with a certain size when we launch the kernel, but I did not find a way to use this block of memory in a more controlled manner. For example, in a global function, I want to allocate an array, whose size is only known in run time, in shared memory space.

Furthermore, in this global function, I want to call a device function, where it is necessary to allocate its own memory (again the size can only be determined in run time). Is it possible with the current CUDA structure?

Thank you,

serge · October 23, 2007, 7:31am

I’ve met this problem. It is impossible IMU for number of reasons.
Used shared memory size should be known before run-time.

humorstar · October 24, 2007, 1:51am

Thank you for the reply.

Suppose I do know the sizes of the arrays before I launch the kernel, and can specify the total size when I launch the kernel, e.g.:
testKernel<< grid, threads, mem_size>>

Then in the testKernel, how to tell the program that one part of the memory (mem_size) is for the first array and the other part of the memory is for the second array?

prkipfer · October 24, 2007, 8:58am

Dynamic allocation of shared mem will never be possible as it uses the register mem space (on chip). AFAIK there is no processor architecture out there that can dynamically vary the register space. So this has to be fixed when the multiprocessor starts working on the program, ie. when the block is lauched, ie. when a grid is executed.

Yes, you can allocate a chunk big enough and just use it as required. Note that the shared mem amount you specify is per block (and each block gets separate mem areas). That is why you cannot share the mem across blocks (see progamming guide). Within the block the threads share the mem and you can easily dispatch upon threadIdx.

If you need to share mem across blocks, you need to use the global mem space (on card). Note that you need to synchronize concurrent access then.

Peter

humorstar · October 25, 2007, 1:32am

Thank you for the information.

I guess I found the answer to my question. It is in Section 4.2.2.3 of the programming guide. I will give it a try. Thanks all,

jdiamond · October 7, 2010, 6:02pm

But then how do you write code that automatically adapts to different graphics cards? For example, what if I want to use 16KB shared memory on TESLA but 48KB on Fermi?

Thanks for the tip.

Jeff

jdiamond · October 7, 2010, 6:02pm

But then how do you write code that automatically adapts to different graphics cards? For example, what if I want to use 16KB shared memory on TESLA but 48KB on Fermi?

Thanks for the tip.

Jeff

jan.heckman · October 7, 2010, 8:17pm

Section B.2.3. of the C programming guide.

jan.heckman · October 7, 2010, 8:17pm

Section B.2.3. of the C programming guide.

Topic		Replies	Views
Dynamic memory allocation CUDA Programming and Performance	4	2960	July 11, 2007
Dynamically allocating memory inside __device/global__ CUDA kernel CUDA Programming and Performance	3	1392	August 11, 2013
shared memory dynamic allocation ? CUDA Programming and Performance	4	2963	December 11, 2009
how to create arrays in runtime in shared memory? CUDA Programming and Performance	3	1052	December 26, 2011
Querying amount of shared memory allocated CUDA Programming and Performance	8	3218	January 9, 2009
Efficient way of reading dynamic array in kernel? CUDA Programming and Performance	5	1679	July 12, 2010
extern __shared__ does not allocate memory CUDA Programming and Performance	1	7508	December 1, 2009
how to dynamically allocate shared memory CUDA Programming and Performance	1	4051	June 26, 2009
Need for dynamic allocated shared memory? CUDA Programming and Performance	2	3583	March 4, 2011
Shared Memory - Dynamic Allocation CUDA Programming and Performance	2	21458	November 21, 2008

How to dynamically allocate shared memory? in _global__ or __device__ functions

Related topics

How to dynamically allocate shared memory? in _global or device__ functions