Help with a Problem (Random Walk Gpu) How to improve this kernel

Thalles · May 29, 2012, 1:01pm

Hello Everybody

I’m doing a work to my University in Brazil using Cuda C.

Basically I’m doing a RandomWalk that consists in simulate a lot of particles moving in a liquid or gas.

My implantation is:

I have a vector (big vector) where the particles can move.
All the particles can move for right, left, to up or top down. (randomly)
All the particles begin their journey at index[0] of the vector and ending at index, where SIZE is the maximum size of the vector.
In each cell where the particle touches I’ve got to increment the vector in his position. Increasing the value 1 e.g. “vector[position]++;”

Basically is that.

My problem is that I’m not getting higher levels of speed up when I’m comparing with the CPU implementation. Only 4x of speedup.

My idea is do each Thread be a Particle, so each Thread has to calculate the effect of one particle in all vector.

Here is my kernel: If someone can help me show me how can I improve this I thank you very much.

global void kernel( int* mat, curandState* state )

{

int id = blockIdx.x * blockDim.x + threadIdx.x;

int passo = 0;

/* Cria variÃ¡veis para controlar movimentaÃ§Ã£o da particula nos registradores

de modo que cada thread tenha sua prÃ³pria cÃ³pia das variÃ¡veis que tem acesso

super rÃ¡pido */

int lin = 0;

int col = 0;

unsigned int number;

/* Copy state to local memory for efficiency */

curandState localState = state [ id  ];

atomicAdd ( &mat[0], 1 );

while ( passo < ( SIZE*SIZE ) )

{

/*gera um nÃºmero pseudo-aleatÃ³rio*/

number = 1 + curand(&localState) % 4;  

//cuPrintf("Number: %d\n", number );



switch( number )

{    

    case 1:

	if ( ( lin - 1 ) >= 0 ){

	    //cuPrintf("Caso 1\n");

	    lin--;

	    atomicAdd ( &mat[lin * SIZE + col], 1 );

	    passo++; 

	}

	break;

	

	

    case 2:

	if ( ( col + 1 ) < SIZE )

	{ 

	    //cuPrintf("Caso 2\n");

	    col++;

	    atomicAdd ( &mat[lin * SIZE + col], 1 );

	    passo++;

	    

	}

	break;

	

    case 3: 

	if ( (lin + 1) < SIZE ){

	    //cuPrintf("Caso 3\n");

	    lin++;

	    atomicAdd ( &mat[lin * SIZE + col], 1 );

	    passo++;

	    

	}

	break;

	

    case 4:

	if ( (col - 1) >= 0 ){

	    //cuPrintf("Caso 4\n");

	    col--;

	    atomicAdd ( &mat[lin * SIZE + col], 1 );

	    passo++;

	}

	break;

}//fim switch 





if ( col == ( SIZE - 1 ) && lin == ( SIZE - 1 ) ) {

    //cuPrintf("\nChegou em sua Casa: [%d %d]\n", lin, col );

    //atomicAdd ( &mat[lin * SIZE + col], 1 );

    break;

}

}

}

cbuchner1 · May 29, 2012, 2:27pm

Atomics are slow. I’d suggest a different approach.

Christian