atomicAdd() problem

Hi! I have been working on a program building a color pallet of an image;
The input parameter img3f is an 1D array of byte size pitchSrc * height;
The output parameter idxli is an 1D array of byte size pitchDst * height, which keep the index in the color pallet;
The thread bolck is of size 32*16.

I was expecting the s_Hist to store the pixel number of color in a block image.

But every elem of s_Hist turned out to be 0. The atomicAdd( s_Hist + data, 1) function seems to have no impact on s_Hist.
Can someone tell me the problem of the following code? Thanks!

global void buildColorPallet(float *img3f, size_t pitchSrc, int idx1i, size_t pitchDst, int width, int height,
int d_PartialHistograms, int binNum)
{
extern shared int s_Hist[];
unsigned int tid = threadIdx.y
blockDim.x+threadIdx.x;
#pragma unroll 4
for(int i = tid; i < binNum ; i += blockDim.x
blockDim.y
s_Hist = 0;
__syncthreads();

unsigned int idx = blockIdx.x*blockDim.x+threadIdx.x);
 unsigned int idy = blockIdx.y*blockDim.y+threadIdx.y);
 if( idx >= width || idy >= height)
     return;
 unsigned int id = idx + idy * pitchDst;
 unsigned int imSize = width * height;
 
int data = __float2int_rn(img3f[id])+ __float2int_rn(img3f[id + imSize]) + __float2int_rn(img3f[id + 2 * imSize ] ); 
atomicAdd( s_Hist + data, 1);
 __syncthreads();
 idx1i[id] = data;
 d_PartialHistograms[(blockIdx.y*gridDim.x+blockIdx.x) * binNum + data] = s_Hist[data];

}

I have changed atomicAdd( s_Hist + data, 1)to s_Hist[data] = 10; The elem of s_Hist is still 0.

Check for errors in kernel launch, memory allocation and other API calls.
One quick way to do is initialize the host mmeory to some known values…
If the values remain intact it means that there we some errors…

Thanks for your help! This problem has been solved!