I’m trying to adapt a code made for CPUs for more speed on GPUs, but I don’t know why it’s too slow.
Here’s the code where I prepare to use the kernel:
int execute_cuda (int *posy, int posx)
{
int *d_retorno;
int *d_ok;
int *resultado;
cudaMalloc((void**)&d_retorno, sizeof(int));
cudaMalloc((void**)&d_ok, sizeof(int));
*d_ok = 0;
position<<<1, DIM>>>(d_retorno, tab_aux, tabuleiro, posx, d_ok);
cudaMemcpy(posy, d_retorno, sizeof(int), cudaMemcpyDeviceToHost);
cudaMemcpy(resultado, d_ok, sizeof(int), cudaMemcpyDeviceToHost);
cudaFree(d_retorno);
cudaFree(d_ok);
return *resultado;
}
And here is the kernel:
__global__ void position(int *d_retorno, char tabela[DIM], char tabuleiro2 [2][(2*DIM)-1], int posx, int *d_ok)
{
int i = threadIdx.x;
if ((tabela[i] != 'O') && (tabuleiro2[0][i+posx] != 'O') && (tabuleiro2[1][(DIM-1)+i-posx] != 'O')){
*d_ok = 1;
*d_retorno = i;
}
}
“tabela”, “tabuleiro2”, “posx” and DIM are constant (in the point of view of the kernel), I can put them in constant cache, but I don’t think it can resolve the problem.
What can I do to make it faster?
One more thing: Is there an instruction that ends a kernel execution like “break” ends a “while”? I could use it in the code.
I would like to thank any answer.
system: Winxp 32-bit
CUDA SDK and toolkit: 2.0
GPU: 9600GT
CPU: E8400, Core2Duo 3GHz
Visual Basic 2005