Array size upper bound in kernel

yingweidadadadadada · November 18, 2021, 2:49am

#define ARR_LEN (1024*1024*1024)

__global__ void simulated(long long int *arr) {
  int tid = blockDim.x * blockIdx.x + threadIdx.x;

  long long int local1[ARR_LEN];
  long long int local2[ARR_LEN];
  long long int local3[ARR_LEN];
  long long int local4[ARR_LEN];
  long long int local5[ARR_LEN];
  long long int local6[ARR_LEN];

  for (int i = 0; i < ARR_LEN; ++i) {
    local1[i] += i*1;
    local2[i] += i*2;
    local3[i] += i*3;
    local4[i] += i*4;
    local5[i] += i*5;
    local6[i] += i*6;
    arr[i] =
        local1[i] + local2[i] + local3[i] + local4[i] + local5[i] + local6[i];
  }
}

I am testing the largest possible array size that I can declare inside a CUDA kernel. However, I am little confused because the code snippet works perfectly fine even with a very large array size.

In the shown example, my understanding is that each thread basically declares 48GB data (8GB per array and 6 arrays). However, those are not really shown in memory usage. So where do those data go physically? Can anyone give me some insights?

Robert_Crovella · November 18, 2021, 4:27am

When I try to run your code using compute-sanitizer I get errors.

The maximum amount of local memory per thread cannot exceed 512KB, currently. Other factors may also prevent you from even reaching that upper bound.

yingweidadadadadada · November 18, 2021, 5:40pm

Thanks for the pointer!

I forgot about the sanity checking tool.

system · December 2, 2021, 5:40pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Local memory limit? CUDA Programming and Performance	1	12166	April 28, 2008
Out of memory when allocating local memory CUDA Programming and Performance	4	974	January 4, 2023
Thread Local variable CUDA Programming and Performance	1	1700	September 23, 2009
Loops cause too much local data error. Trouble processing large arrays in global memory on a kernel. CUDA Programming and Performance	1	871	March 30, 2011
Problems with local memory CUDA Programming and Performance	3	859	April 22, 2016
temporary memory issues CUDA Programming and Performance	11	5462	March 30, 2008
limitated amount of global memory for a kernel? CUDA Programming and Performance	3	2627	August 23, 2007
array elements limitation CUDA Programming and Performance	4	1082	May 19, 2009
memory overflow? CUDA Programming and Performance	1	5179	July 27, 2009
What is the maximum CUDA Stack frame size per Kerenl. CUDA Programming and Performance	1	13742	November 18, 2013

Array size upper bound in kernel

Related topics