Hi everyone,
recently, I optimize my programme, I found the the size of shared memory in cubin file is greater than I used in the kernel (only greater 32byte). I wonder where I use these shared memory?
[codebox]/************************************************************
************/
/* calculate the maximum and minimum of a vector */
/************************************************************
************/
template
global void CalMaxMinD1(int np, float* idata, float* odatamax, float* odatamin)
{
int tid = threadIdx.x;
int i = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;
int gridsize = __mul24(gridDim.x, blockDim.x);
__shared__ float maxtemp[256];
__shared__ float mintemp[256];
// load
maxtemp[tid] = idata[i];
mintemp[tid] = maxtemp[tid];
i += gridsize;
while (i < np)
{
float temp = idata[i];
if (maxtemp[tid] < temp) { maxtemp[tid] = temp; }
if (mintemp[tid] > temp) { mintemp[tid] = temp; }
i += gridsize;
}
__syncthreads();
…
do the reduce works
…
// output
if (tid == 0)
{
odatamax[blockIdx.x] = maxtemp[0];
odatamin[blockIdx.x] = mintemp[0];
}
}[/codebox]
In this code, I only use two shared memory array. it is totally 2564Byte2 = 2048. While in the cubin file, it shows I used 2080 Byte shared memory. Could you please tell me why?
[codebox]code {
name = _Z11CalMaxMinD1ILi2EEviPfS0_S0_
lmem = 0
smem = 2080
reg = 8
bar = 1
const {
segname = const
segnum = 1
offset = 0
bytes = 4
mem {
0x0000001f
}
}
bincode {.......
}
}[/codebox]
peter