Help! How to vectorise this loop?

kinderchocolate · January 7, 2010, 1:57pm

I have this double for-loop

for (int j1 = NTAB + 7; j1 >= 0; j1–) {
for (int n = 0; n < PATH_N; n++)
{
k[n] = idum[n] / IQ;
idum[n] = IA * (idum[n] - k[n] * IQ) - IR * k[n];
if (idum[n] < 0) idum[n] += IM;
}
if (j1 < NTAB)
{
for (int n = 0; n < PATH_N; n++)
{
iv[j1][n] = idum[n];
}
}
}

How to vectorise this double for-loop for Cuda? Each iteration in the outer loop is dependent on the results from the previous iterations (eg: idum)…

Keldor314 · January 9, 2010, 6:32am

It looks like the dependencies are strictly in one dimension? That is, iv[j1][n] is only dependent on idum[n]? If so, you should be able to vectorize along n without any side effects by vectorizing the inner loop to multiple threads, while leaving the outer loop as is. Otherwise, you’ll have to synchronize at each step of the outer loop, which would only be practical for large values of PATH_N and small values of NTAB.

for (int j1 = NTAB + 7; j1 >= 0; j1--) {

	for (int n = 0; n < PATH_N; n++)

	{

		k[n] = idum[n] / IQ;

		idum[n] = IA * (idum[n] - k[n] * IQ) - IR * k[n];

		if (idum[n] < 0) idum[n] += IM;

	}

	if (j1 < NTAB)

	{

		for (int n = 0; n < PATH_N; n++)

		{

			iv[j1][n] = idum[n];

		}

	}

}

to something like

int n = threadIndex;

for (int j1 = NTAB + 7; j1 >= 0; j1--) {

	k[n] = idum[n] / IQ;

	idum[n] = IA * (idum[n] - k[n] * IQ) - IR * k[n];

	if (idum[n] < 0) idum[n] += IM;

	if (j1 < NTAB)

	{

		iv[j1][n] = idum[n];

	}

}

If you could get rid of the (idum[n] <0) idum+= IM; conditional, it might even be possible to compute idum directly from j1, since it looks like it might reduce to a geometric series or something.

What’s with the IQ, though? (idum[n]-k[n]*IQ) = (idum[n]-(idum[n]/IQ)*IQ) = (idum[n]-idum[n]) = 0 for IQ != 0 and NaN otherwise…

Topic		Replies	Views
Parallelisim CUDA Programming and Performance	3	2653	August 24, 2007
Loop carried dependence of a->x prevents parallelization Legacy PGI Compilers	3	788	April 10, 2023
problem passing big loop to cuda CUDA Programming and Performance	3	8896	March 23, 2008
Interthreaded communication using cuda CUDA Programming and Performance	4	940	September 2, 2015
Possible to paralellize dependent forloop in cuda? Each iteration has to occur in the order CUDA Programming and Performance	1	4044	June 16, 2008
how to implement double for loops in CUDA CUDA Programming and Performance	23	16004	January 30, 2012
Iteration control CUDA Programming and Performance	2	709	November 6, 2015
Is this code parallelize ? CUDA Programming and Performance	2	976	June 17, 2009
Cuda programming requiremnt CUDA Programming and Performance	0	3082	July 20, 2010
nested parallelism CUDA Programming and Performance	1	4381	January 15, 2009

Help! How to vectorise this loop?

Related topics