Help! How to vectorise this loop?

I have this double for-loop

for (int j1 = NTAB + 7; j1 >= 0; j1–) {
for (int n = 0; n < PATH_N; n++)
{
k[n] = idum[n] / IQ;
idum[n] = IA * (idum[n] - k[n] * IQ) - IR * k[n];
if (idum[n] < 0) idum[n] += IM;
}
if (j1 < NTAB)
{
for (int n = 0; n < PATH_N; n++)
{
iv[j1][n] = idum[n];
}
}
}

How to vectorise this double for-loop for Cuda? Each iteration in the outer loop is dependent on the results from the previous iterations (eg: idum)…

It looks like the dependencies are strictly in one dimension? That is, iv[j1][n] is only dependent on idum[n]? If so, you should be able to vectorize along n without any side effects by vectorizing the inner loop to multiple threads, while leaving the outer loop as is. Otherwise, you’ll have to synchronize at each step of the outer loop, which would only be practical for large values of PATH_N and small values of NTAB.

for (int j1 = NTAB + 7; j1 >= 0; j1--) {

	for (int n = 0; n < PATH_N; n++)

	{

		k[n] = idum[n] / IQ;

		idum[n] = IA * (idum[n] - k[n] * IQ) - IR * k[n];

		if (idum[n] < 0) idum[n] += IM;

	}

	if (j1 < NTAB)

	{

		for (int n = 0; n < PATH_N; n++)

		{

			iv[j1][n] = idum[n];

		}

	}

}

to something like

int n = threadIndex;

for (int j1 = NTAB + 7; j1 >= 0; j1--) {

	k[n] = idum[n] / IQ;

	idum[n] = IA * (idum[n] - k[n] * IQ) - IR * k[n];

	if (idum[n] < 0) idum[n] += IM;

	if (j1 < NTAB)

	{

		iv[j1][n] = idum[n];

	}

}

If you could get rid of the (idum[n] <0) idum+= IM; conditional, it might even be possible to compute idum directly from j1, since it looks like it might reduce to a geometric series or something.

What’s with the IQ, though? (idum[n]-k[n]*IQ) = (idum[n]-(idum[n]/IQ)*IQ) = (idum[n]-idum[n]) = 0 for IQ != 0 and NaN otherwise…