where does this __shared__ memory comes from?

Hi,

i have a kernel code that doesn’t use shared memory. Neither do i have constants

But when I compile there are 24+16 (and why can’t it do the sum, btw??) bytes of shared memory that my kernel uses, without me explicitly to ask for it. I have checked the PTX specs, and shared memory should be declared with “.shared” keyword. When i do a “grep” it confirms that my PTX code doesn’t uses “.shared” memory. So why does the -ptxas-options=-v reports some shared memory ?? And how is shared memory declared in the PTX without .shared keyword?

Thanks in advance.

master waver # nvcc --ptx -arch=sm_13 -I"/usr/local/cuda/common/inc" -L"/usr/local/cuda/common/lib" "-lrt -lcutil" neat-actnodes-test2.cu

./neat_k-activate_nodes.cu(1): Advisory: Cannot tell what pointer points to, assuming global memory space

./neat_k-activate_nodes.cu(1): Advisory: Cannot tell what pointer points to, assuming global memory space

./neat_k-activate_nodes.cu(20): Advisory: Cannot tell what pointer points to, assuming global memory space

master waver # grep -i shared neat-actnodes-test2.ptx 

master waver # nvcc --ptxas-options=-v -arch=sm_13 -I"/usr/local/cuda/common/inc" -L"/usr/local/cuda/common/lib" "-lrt -lcutil" neat-actnodes-test2.cu

./neat_k-activate_nodes.cu(1): Advisory: Cannot tell what pointer points to, assuming global memory space

./neat_k-activate_nodes.cu(1): Advisory: Cannot tell what pointer points to, assuming global memory space

./neat_k-activate_nodes.cu(20): Advisory: Cannot tell what pointer points to, assuming global memory space

ptxas info	: Compiling entry function '_Z14activate_nodesP8net_pack'

ptxas info	: Used 15 registers, 24+16 bytes smem, 36 bytes cmem[1]

kernel code:

__global__ void activate_nodes(net_pack *npack) {

	uint me; double sum=0;

	uint node_idx;

	me=blockIdx.x<<8;

	me+=threadIdx.x;

	long i;

	if (me<npack->nodes_total) {

			for(i=0;i<npack->num_links[me];i++) {

				node_idx=npack->lni[npack->nli[me]]+i;

				if (npack->num_links<0) { // bias node

					sum=sum+npack->nvalues[node_idx];

				} else {

					sum=sum+npack->lweights[npack->nli[me]+i]*npack->nvalues[node_idx];

				}

			}

			sum=1/(1+exp(-(4.924273f*sum)));

			npack->nmirror[me]=sum;

		} else {

			npack->nmirror[me]=npack->nvalues[me];

		}

}

Global parameters such as grid/threads/block identifiers are passed in shared memory to the kernel method.

As for the 24+16, 24 is the actual shared memory bytes your kernel uses, the +16 is not defined in the specs, i guess

its something internal to nVidia.

eyal

in the FAQ http://forums.nvidia.com/index.php?showtopic=84440 you can read that shared memory is used for parameter passing and holds some variables such as blockDim, gridDim and blockIdx.

Also function parameters are passed via shared memory - in your case, npack will reside there. I don’t know PTX but there might be another instruction for fetching function parameters.

24+16 stands for:

  • You use 16 bytes of shared memory
  • There were 24 bytes allocated

The overhead is probably because of some alignment constraints, but there may be something more as well…