Secuential Access to CUDA

David_Lisin · July 2, 2009, 2:26pm

Hi everyone, I was wondering if someone could point me in the right direction.

I need to get the following code running in CUDA:

for(i=0;i<rowsmatrix;i++){

	acumul[0]=matrix[i]*matrix[i+rowsmatrix];

	output[0]+=acumul[0];

	for(j=1;j<colsmatrix-1;j++){

		   acumul[j]=acumul[j-1]*matrix[j*rowsmatrix+i+colsmatrix];

		   output[j]+=acumul[j];

	}

}

The problem is that obviously each thread is created, and executude non-secuentially. Therefore, thread 7 may execute before thread 6.

In this case, when lookin for acumul[j-1] when calculating acumul[j], we would get the wrong value, as it hasnt been calculated.

Could anyone please guide me in the right direction.

Thanks in advance,

David

jack · July 2, 2009, 2:47pm

Are you trying to implement an algorithm for a specific problem? Perhaps there is another algorithm that parallelizes better than this one.

Otherwise, you are probably out of luck. Not all code can be ported to CUDA, since there are some inherently sequential problems.

David_Lisin · July 2, 2009, 2:58pm

Thanks for answering! Unfortunately I cant change the algorithm, so looks like Im going to have to bash my head against the wall until I can resolve the problem!!

avidday · July 2, 2009, 3:23pm

There are several parallel prefix sum implementations for CUDA floating around that will do what you want (I think the thrust class lib has one, for example).