Help with a Problem (Random Walk Gpu) How to improve this kernel

Hello Everybody

I’m doing a work to my University in Brazil using Cuda C.

Basically I’m doing a RandomWalk that consists in simulate a lot of particles moving in a liquid or gas.

My implantation is:

  • I have a vector (big vector) where the particles can move.

  • All the particles can move for right, left, to up or top down. (randomly)

  • All the particles begin their journey at index[0] of the vector and ending at index, where SIZE is the maximum size of the vector.

  • In each cell where the particle touches I’ve got to increment the vector in his position. Increasing the value 1 e.g. “vector[position]++;”

Basically is that.

My problem is that I’m not getting higher levels of speed up when I’m comparing with the CPU implementation. Only 4x of speedup.

My idea is do each Thread be a Particle, so each Thread has to calculate the effect of one particle in all vector.

Here is my kernel: If someone can help me show me how can I improve this I thank you very much.

Atomics are slow. I’d suggest a different approach.