Problems with thread synchronization and atomic functions

Hi. I have a problem, it’s too weird. Here is my kernel which it’s written to map a big vector (d_vect) with 100 elements into a smaller vector (d_res). Every thread compare a single element of d_vect and if it is = 2.0, this element has to be mapped into the next index of d_res. The place where those elements are mapped into d_res is indicated by the counter ‘count’, a variable located on global memory which is increased by 1 when a new element is mapped. So that an atomic addition is mandatory. The problem comes when I compile:

[indent]Advisory: Removed dead synchronization intrinsic from function _Z7mappingPfS_Pi[/indent]

the kernel:

[indent]device int count;

global void mapping(float *d_vect,float *d_res,int *d_c){

int tid;
float elem;
count = 0;

tid = blockDim.x * blockIdx.x + threadIdx.x;

__syncthreads();

if(tid < 100){
	elem = d_vect[tid];
	if(elem == 2.0){
		d_res[ count ] = elem;
		atomicAdd( &count, 1);
		
	}
}

}[/indent]

When I use a global variable passed by reference ( *d_c ), the syncronization is made, but the result is not correct (like the line “d_res[count] = elem” wouldn’t be there).
Anyone knows what is supposed to be happening?
Thanks!