Hi,
i have a kernel code that doesn’t use shared memory. Neither do i have constants
But when I compile there are 24+16 (and why can’t it do the sum, btw??) bytes of shared memory that my kernel uses, without me explicitly to ask for it. I have checked the PTX specs, and shared memory should be declared with “.shared” keyword. When i do a “grep” it confirms that my PTX code doesn’t uses “.shared” memory. So why does the -ptxas-options=-v reports some shared memory ?? And how is shared memory declared in the PTX without .shared keyword?
Thanks in advance.
master waver # nvcc --ptx -arch=sm_13 -I"/usr/local/cuda/common/inc" -L"/usr/local/cuda/common/lib" "-lrt -lcutil" neat-actnodes-test2.cu
./neat_k-activate_nodes.cu(1): Advisory: Cannot tell what pointer points to, assuming global memory space
./neat_k-activate_nodes.cu(1): Advisory: Cannot tell what pointer points to, assuming global memory space
./neat_k-activate_nodes.cu(20): Advisory: Cannot tell what pointer points to, assuming global memory space
master waver # grep -i shared neat-actnodes-test2.ptx
master waver # nvcc --ptxas-options=-v -arch=sm_13 -I"/usr/local/cuda/common/inc" -L"/usr/local/cuda/common/lib" "-lrt -lcutil" neat-actnodes-test2.cu
./neat_k-activate_nodes.cu(1): Advisory: Cannot tell what pointer points to, assuming global memory space
./neat_k-activate_nodes.cu(1): Advisory: Cannot tell what pointer points to, assuming global memory space
./neat_k-activate_nodes.cu(20): Advisory: Cannot tell what pointer points to, assuming global memory space
ptxas info : Compiling entry function '_Z14activate_nodesP8net_pack'
ptxas info : Used 15 registers, 24+16 bytes smem, 36 bytes cmem[1]
kernel code:
__global__ void activate_nodes(net_pack *npack) {
uint me; double sum=0;
uint node_idx;
me=blockIdx.x<<8;
me+=threadIdx.x;
long i;
if (me<npack->nodes_total) {
for(i=0;i<npack->num_links[me];i++) {
node_idx=npack->lni[npack->nli[me]]+i;
if (npack->num_links<0) { // bias node
sum=sum+npack->nvalues[node_idx];
} else {
sum=sum+npack->lweights[npack->nli[me]+i]*npack->nvalues[node_idx];
}
}
sum=1/(1+exp(-(4.924273f*sum)));
npack->nmirror[me]=sum;
} else {
npack->nmirror[me]=npack->nvalues[me];
}
}