Out of memory when allocating local memory

Robert_Crovella · January 4, 2023, 3:46pm

Yes, there is another limit on local memory (and related: stack, since stack manifests in the logical local space per thread). njuffa has described it here

I think if you run through that math for your V100 GPU, you will find the problem. The calculation will show that your 210816 byte per thread request requires 34540093440 bytes when considered device-wide (for V100 device), and that exceeds the 32GB available on your GPU. (Anticipating: No, the launch configuration <<<1,1>>> is not considered in this analysis.)

Topic		Replies	Views
Problems with local memory CUDA Programming and Performance	3	875	April 22, 2016
Array size upper bound in kernel CUDA Programming and Performance	3	866	December 2, 2021
Local memory limit? CUDA Programming and Performance	1	12173	April 28, 2008
Thread Local variable CUDA Programming and Performance	1	1713	September 23, 2009
Local memory array giving illegal access error CUDA Programming and Performance	4	1185	November 20, 2020
Per thread local memory Per thread local memory specified in C Programming Guide CUDA Programming and Performance	1	904	March 6, 2012
What is the maximum CUDA Stack frame size per Kerenl. CUDA Programming and Performance	1	13779	November 18, 2013
Why does a simple single-threaded CUDA kernel consume large amounts of global memory? CUDA Programming and Performance	7	6705	February 24, 2011
Cuda - Out of Memory Area CUDA Programming and Performance cuda	3	458	November 9, 2023
temporary memory issues CUDA Programming and Performance	11	5504	March 30, 2008

Out of memory when allocating local memory

Related topics