where does this shared memory comes from?

nuliknol · June 16, 2009, 2:05am

Hi,

i have a kernel code that doesn’t use shared memory. Neither do i have constants

But when I compile there are 24+16 (and why can’t it do the sum, btw??) bytes of shared memory that my kernel uses, without me explicitly to ask for it. I have checked the PTX specs, and shared memory should be declared with “.shared” keyword. When i do a “grep” it confirms that my PTX code doesn’t uses “.shared” memory. So why does the -ptxas-options=-v reports some shared memory ?? And how is shared memory declared in the PTX without .shared keyword?

Thanks in advance.

master waver # nvcc --ptx -arch=sm_13 -I"/usr/local/cuda/common/inc" -L"/usr/local/cuda/common/lib" "-lrt -lcutil" neat-actnodes-test2.cu

./neat_k-activate_nodes.cu(1): Advisory: Cannot tell what pointer points to, assuming global memory space

./neat_k-activate_nodes.cu(1): Advisory: Cannot tell what pointer points to, assuming global memory space

./neat_k-activate_nodes.cu(20): Advisory: Cannot tell what pointer points to, assuming global memory space

master waver # grep -i shared neat-actnodes-test2.ptx 

master waver # nvcc --ptxas-options=-v -arch=sm_13 -I"/usr/local/cuda/common/inc" -L"/usr/local/cuda/common/lib" "-lrt -lcutil" neat-actnodes-test2.cu

./neat_k-activate_nodes.cu(1): Advisory: Cannot tell what pointer points to, assuming global memory space

./neat_k-activate_nodes.cu(1): Advisory: Cannot tell what pointer points to, assuming global memory space

./neat_k-activate_nodes.cu(20): Advisory: Cannot tell what pointer points to, assuming global memory space

ptxas info	: Compiling entry function '_Z14activate_nodesP8net_pack'

ptxas info	: Used 15 registers, 24+16 bytes smem, 36 bytes cmem[1]

kernel code:

__global__ void activate_nodes(net_pack *npack) {

	uint me; double sum=0;

	uint node_idx;

	me=blockIdx.x<<8;

	me+=threadIdx.x;

	long i;

	if (me<npack->nodes_total) {

			for(i=0;i<npack->num_links[me];i++) {

				node_idx=npack->lni[npack->nli[me]]+i;

				if (npack->num_links<0) { // bias node

					sum=sum+npack->nvalues[node_idx];

				} else {

					sum=sum+npack->lweights[npack->nli[me]+i]*npack->nvalues[node_idx];

				}

			}

			sum=1/(1+exp(-(4.924273f*sum)));

			npack->nmirror[me]=sum;

		} else {

			npack->nmirror[me]=npack->nvalues[me];

		}

}

eyalhir74 · June 16, 2009, 6:07am

Global parameters such as grid/threads/block identifiers are passed in shared memory to the kernel method.

As for the 24+16, 24 is the actual shared memory bytes your kernel uses, the +16 is not defined in the specs, i guess

its something internal to nVidia.

eyal

sergeyn · June 16, 2009, 6:33am

in the FAQ [url=“http://forums.nvidia.com/index.php?showtopic=84440”]http://forums.nvidia.com/index.php?showtopic=84440[/url] you can read that shared memory is used for parameter passing and holds some variables such as blockDim, gridDim and blockIdx.

Cygnus_X1 · June 16, 2009, 9:28am

Also function parameters are passed via shared memory - in your case, npack will reside there. I don’t know PTX but there might be another instruction for fetching function parameters.

24+16 stands for:

You use 16 bytes of shared memory
There were 24 bytes allocated

The overhead is probably because of some alignment constraints, but there may be something more as well…

where does this __shared__ memory comes from?

where does this shared memory comes from?