Strange ptxas error in shared memory

ombra32 · February 23, 2009, 1:07pm

In my kernel I’ve allocated 4 shared memory locations of 32x32 elements, each one of 4 bytes (data type is float)

After launching the compiler, I receive back this error:

ptxas error  : Entry function '_Z13LU_updaterestPfii' uses too much shared data (0x1000 bytes + 0x301c bytes system, 0x4000 max)

But:

32324*4 = 16384 Bytes = 16 kB and shared memory is max 16kB..!!!

Why I receive this error?

My kernel is:

__global__ void LU_updaterest(float* a, int step, int W)

{

  int i, j;

  float acc;

int tx = threadIdx.x;

  int bx = blockIdx.x;

  int by = blockIdx.y;

  int idx = blockIdx.x*blockDim.x + blockIdx.y*blockDim.x*W + threadIdx.x;

__shared__ float as[block_size*block_size];

  __shared__ float ls[block_size*block_size];

  __shared__ float us[block_size*block_size];

  __shared__ float temp[block_size*block_size];

for(i=0; i<block_size; i++){	// copy into shared mem

	as[i*block_size + tx] = a[step*block_size*W + step*block_size + idx + i*W];

	ls[i*block_size + tx] = a[step*block_size*W + (step-1)*block_size + by*block_size*W + tx + i*W];

	us[i*block_size + tx] = a[(step-1)*block_size*W + step*block_size + bx*block_size + tx + i*W];

  }

// compute product

  for (i = 0; i < block_size; i++){	// i select row

	acc = 0;

	for (j = 0; j < block_size; j++) // j walk through products

	  acc += ls[i*block_size + j] * us[j*block_size + tx];

	temp[i*block_size + tx] = acc; // ls stores also the new computed as'

  }

// complete computation and write back results

  for(i=0; i<block_size; i++)

	a[step*block_size*W + step*block_size + idx + i*W] = as[i*block_size + tx] - temp[i*block_size + tx];

}

Thanks

E.D_Riedijk · February 23, 2009, 1:28pm

All kernel parameters are stored in shared memory, and also variables like blockIdx, blockDim, gridDim. So you can not use the full 16kB.

ombra32 · February 23, 2009, 2:54pm

I suspected it was something like this… :(

Mu-Chi_Sung · February 24, 2009, 6:50am

I remember kernel parameters are stored in constant memory…blockIdx,blockDim,gridDim should be some special registers available on SM…

Is this changed since 2.0? or is it the result interpreted from decuda output? (or maybe somewhere in the manual?)

Thanks!

Austin · February 24, 2009, 8:52am

I would agree with this.You can take an experiment with a device kernel function using no share memory ,through Visual Profiler tools you would find that “static shared memory allocate one block” is something else but not 0.

But what’s difference between “static shared memory” and “dynamic shared memory”? Confused! :wacko:

ombra32 · February 24, 2009, 9:05am

I’ve tryed it both in CUDA 1 and CUDA 2 and both in debug mode and release mode but the output is the same

this output is generated by compiler

I’ve searched inside the manual but I haven’t fount anything about this problem

ombra32 · February 24, 2009, 9:07am

But why? :blink:

Austin · February 24, 2009, 11:56am

I think it is what Riedijk said “All kernel parameters are stored in shared memory, and also variables like blockIdx, blockDim, gridDim. So you can not use the full 16kB.”.

Topic		Replies	Views
Shared Mem size and Shared vs L1 Cache memory CUDA Programming and Performance	9	2105	November 17, 2010
where is the another 32 byte shared memory CUDA Programming and Performance	2	6067	July 21, 2009
Size limit on dynamic allocated shared memory CUDA Programming and Performance	2	1508	November 6, 2008
cudaError_enum error with a lot of shared memory CUDA Programming and Performance	2	5076	March 22, 2007
allocatable size of shared memory CUDA Programming and Performance	4	9015	March 13, 2007
System reserved shared memory? How can it be possible? CUDA Programming and Performance	5	1541	July 8, 2010
question about shared memory why 16K does not work? CUDA Programming and Performance	6	7512	January 1, 2009
Problem with dynamically allocated shared memory CUDA Programming and Performance	3	2740	July 11, 2008
using too much shared memory on fermi? CUDA Programming and Performance	2	1436	June 29, 2010
GeForce GTS 450 - max shared memory CUDA Programming and Performance	2	1308	August 19, 2011

Strange ptxas error in shared memory

Related topics