How can I Convert my Factorial c program in Cuda it working good in c,but I would like to make thi

Hi all,

I have been trying to covert my Factorial of 4 program.c into Cuda Factorial program.But I am unable to do so. Following is my Cprogram(Factorialof4.c).I wrote a cuda program for it(see Attachment)But it gives wrong output.

Tell Me How can I convert this program in Cuda?

Thanks in Advance


int Fact(int *a,int n)


	int fact,i;

	fact =1;








int main()


	int *a,n,factorial;



	printf("Factorial of %d=%d",n,factorial);

	return 0;

} (459 Bytes)

You have two problems with this code:

__global__ void Fact(int *a,int n)


	int fact=1;

	int i=blockIdx.x*blockDim.x+threadIdx.x;









Firstly, fact is local to each thread you launch, so it effectively always takes the value of the thread index +1, which can never give you the cumulative product you are trying to compute. Secondly you have a memory race on a[0] - every running thread will be trying to overwrite that value simultaneously. There are no guarantees of correctness in such cases.

In your other thread (please don’t cross post), Sarnath suggested looking at using a reduction for this. It is good advice. The CUDA SDK includes a set of sample codes for the parallel reduction, which can be uses to efficiently parallelise associative operations like sums, products, min, max, etc. It also includes a very good pdf whitepaper describing the algorithm and design choices for implementation in CUDA. You should definitely take the time to read and understand it.

Thank for Advice avidday