What happen if memory is full?

Floow · March 28, 2012, 2:31pm

Hi,

All in the title. I mean if the shared memory of the block is full what happen? does the program stop or the global memory is use?

And I guess that when global memory is full we have an memory error, such as ‘unspecified launch failure’

tera · March 28, 2012, 2:58pm

How do you expect shared memory to become “full”? There are no dynamic allocations for shared memory. The only way in which shared memory might no be large enough is when the static allocation + memory allocation at kernel launch exceed the shared memory per SM, in which case the kernel does not launch at all.

EDIT: Correct typo that distorted the meaning.

seibert · March 28, 2012, 3:05pm

Assuming you aren’t using malloc() in your kernel code, you will get a specific out of memory error code (cudaErrorMemoryAllocation, I believe) from cudaMalloc() if your request exceeds the available global memory. (Remember to check all those return codes, even if you just plan to abort!) An unspecified launch failure indicates something more like a memory access violation.

The only way a kernel can hit a shared memory limit is if it requests more than the total configured shared memory per multiprocessor (16 kB on compute capability < 2.0, 16 or 48 kB on compute capability 2.x, and 16, 32, or 48 kB on compute capability 3.0). In that case, the kernel will fail to launch, and the next CUDA function should return cudaErrorInvalidConfiguration. Because shared memory is not dynamically allocated, it is impossible for a kernel to hit an “out of shared memory” condition while running.

Floow · March 28, 2012, 3:28pm

@tera

Yes I think that’s it. I am not sure what means SM? I still have some trouble with memory (as you can see), I hope to be clear enough.

I did some compute and I can launch kernel if the static allocation + memory allocation is greater than the total amount of shared memory per block given by deviceQuery (49152 bytes). Indeed, I allocate ( with malloc, so dynamic ? / and with float for example ) at least 1740 bytes and launch 54 threads per block (XBLOCK=54), which give 54*1740 = 93960 bytes per block (so in shared memory?) > 49152 bytes.

I may misunderstood the way of shared memory is used.

@seibert

I try to do a memCpy after my kernel and it returns ‘unspecified launch failure’ if I increase XBLOCK or XGRID (I have YBLOCK and YGRID egal to 1). But if I have small values for XBLOCK and XGRID (54 and 35 for example) my program works (slowly but it works).

For each threads I make 2 malloc of 206*sizeof(float) = 1664 bytes, so maybe it is too big? If I improve XBLOCK, some malloc fail and I get ‘unspecified launch failure’ after the kernel (you will say of course your array is not define because malloc failed).

I really guess malloc failed because of lack of memory.

What do you think? Is there issue ti this problem, or I only have to try to reduce the use of memory?

I really appreciate your answers and I hope to be as clear as you, but I am not sure. Sorry if I am unclear.

tera · March 28, 2012, 3:44pm

Here SM stands for Streaming Multiprocessor, i.e. the instance that shares a shared memory block.

Sorry, I had a meaning-distorting typo in that post. You can have more than 49152 bytes of static memory, but not more of 49152 bytes of shared memory per block.

In-kernel malloc() (or any dynamic allocation for that matter) doesn’t give you shared memory, but global memory.

Floow · March 28, 2012, 4:07pm

Thanks a lot, I will think about all of this to improve my program and I will ask other questions if I need

And what about the “usual allocation”, for example

float a;

Is it allocated in global memory as static allocation?

tera · March 28, 2012, 4:27pm

That will either give you a register, or “local” memory (basically global memory, with a different layout to improve coalescing).

Floow · March 29, 2012, 8:39am

One more time, thanks a lot tera. Your answers are really clear.

Topic		Replies	Views
Size limit on dynamic allocated shared memory CUDA Programming and Performance	2	1525	November 6, 2008
Shared memory to Global memory data transfer nvc, nvc++ and nvfortran cuda	5	697	June 26, 2023
Launch out of Resources: Why? CUDA Programming and Performance	12	14826	May 28, 2008
shared memory and CUDA calculator CUDA Programming and Performance	6	4147	October 26, 2008
Can't launch 1 block with 1024 threads when maximizing shared memory using cudaFuncSetAttribute CUDA Programming and Performance	2	401	August 11, 2023
Causes of unspecified launch failure CUDA Programming and Performance	8	9013	July 9, 2009
allocatable size of shared memory CUDA Programming and Performance	4	9030	March 13, 2007
Shared memory and Global memory CUDA Programming and Performance	1	1722	April 14, 2012
extern __shared__ limits CUDA Programming and Performance	2	1998	June 26, 2008
How to allocate more than 48KB shared memory on A100? CUDA Programming and Performance	3	1177	April 29, 2023

What happen if memory is full?

Related topics