what can be wrong with my code is read-modify-write a 1D stream safe?

I am wroking on a simple monte-carlo simulation. It simulates a number of particles moving around, and assign the values of the particles to the passing grid. The following is the skeleton of my program:

//-----------------------------------------------------------

kernel Move1Step(float3 position[], float val[], float media[]){

	 pos=position [idx];

	 v=val[idx];

	 {{modify pos and v based on media[]}}

	 pisition[idx]=pos;

	 val[idx]=v;

}

//-----------------------------------------------------------

kernel Assign2Grid(float3 position, float val[], float map[]) {

	 pos=position [idx];

	 v=val[idx];

	 map[MAP_INDEX_3D_TO_1D(pos.x,pos.y,pos.z)] += v;

}

//-----------------------------------------------------------

void main(){

   ...

   for(iter=0;iter<10000;iter++){

	   Move1Step<<<?,?>>>(position, val,media);

	   Assign2Grid<<<?,?>>>(position, val,map);

   }

   ...

}

I’ve checked both kernels and they look just fine (and simple), but I am getting weird results: the map is very sparse with only a few non-zero values.

I am wondering if I messed up with something here. does the above code look logically correct to you? if not, what should I change?

thank you for any comments.

(note: assign2grid<<<>>> is basically a scatter kernel. the length of position/val is about 1000, and map can be 128x128x128.)

It will probably be better to show bits of your CUDA implementation. The pseudo-BrookGPU code fragments are likely to confuse us. :) (Especially since much of the question is going to hinge on how you calculate idx.)

thanks for the reply. I think I know what was the problem. I used a very poor random number generator, and it has very short period, therefore, it repeated quickly and filled only sparse positions.

Now I am trying to get feedback from this thread to get the MT RNG work for me (the definition of N, MT_THREADS, and their relationships to the running thread/blocks are not clear to me).