Hi,
I am a bit new to OpenCL and I have a really strange problem with a “For” loop. The final code is intended to be used to simulate Bingo games and determine frequencies of a set of bingo patterns. The first step is to generate random bingo card. I know random is a bit of an issue… but I have the CPU pre-generate different seeds for each threads so that should not be a problem. My problem is when I try to pick random balls to fill up the bingo card, my “for” loop only works if I do no more than 5 loops. To make it clearer, this works:
for (i = 0; i < 5; i++)
{
iBingoPos[i] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );
}
iBingoPos[5] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );
but this does’nt
for (i = 0; i < 6; i++)
{
iBingoPos[i] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );
}
As a test, I am copying the data from iBingoPos to a buffer to see it on the VS2008 debug, the 2nd code, which should do exactly the same, ends up with every element of the array having a value of 0 (which is the initial value of every element of the buffer).
Here is the complete code except for the RNG itself (company proprietary information, but it simply takes 2 seeds, figure out a random float value between 0 and 1, modifying the seeds value in the process):
void GenerateBingoCard(int* iBingoPos, int* iSeed1, int* iSeed2);
float DrawReal(int *iSeed1, int *iSeed2)
{
[…] returns a random float between 0 and 1
}
__kernel void BingoSimulations(__global const int* iSimulationParameters, __global const int* iSeeds, __global const int* iFixedPatterns,
__global const int* iPatternCandidates, __global const int* WorkPatterns, __global int* iWorkArea)
{
int iThreadID = get_global_id(0);
int iTotalNbThreads = get_global_size(0);
int iNbsimul = iSimulationParameters[0];
int iNbFixedPatterns = iSimulationParameters[1];
int iNbCandidates = iSimulationParameters[2];
int iNbWorkPatterns = iSimulationParameters[3];
int iEndingPattern = iSimulationParameters[4];
int iSeed1 = iSeeds[iThreadID];
int iSeed2 = iSeeds[iThreadID + iTotalNbThreads];
int iBingoPos[75];
GenerateBingoCard(iBingoPos, &iSeed1, &iSeed2);
// for testing purposes … transfers the bingo card of thread 0 in the output buffer
if (iThreadID == 0)
{
for (int i = 0; i < 75; i++)
iWorkArea[i] = iBingoPos[i]; //(int)( DrawReal(&iSeed1, &iSeed2) * 15 );
}
}
void GenerateBingoCard(int* iBingoPos, int* iSeed1, int* iSeed2)
{
int i, x, y, iPickedIndex, iTmp;
for (i = 0; i < 75; i++)
iBingoPos[i] = -1;
int iValueList[15];
for (i = 0; i < 15; i++)
iValueList[i] = i;
for (i = 0; i < 25; i++)
{
iBingoPos[i] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );
}
}
On this last for, if I have it stop at 5, the content copied in the output buffer will show as expected the index 0 to 4 being randoms between 0 and 15 and the rest up to 74 being -1. If I make it do more than those 5 loops, it end up being all 0’s. I can do
iBingoPos[0] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );
iBingoPos[1] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );
iBingoPos[2] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );
iBingoPos[3] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );
iBingoPos[4] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );
iBingoPos[5] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );
iBingoPos[6] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );
iBingoPos[7] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );
iBingoPos[8] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );
[…]
and it works fine, which prooves me the problem is not in the random function.
If I try to simply assign a value in the loop as in
for (i = 0; i < 25; i++)
{
iBingoPos[i] = 12;
}
it works fine … which seems to indicate that the “for” works fine too…
The random function does’nt do anything fancy, it creates 2 local variables (int and float) and only does a bunch of + - * /.
Any ideas?
System spec:
Dell M4500 laptop (core i7, 4GB DDR3)
Nvidia Quadro 1800M (Open CL 1.0)
Windows XP pro EN
Visual C# 2008 (using Cloo for OpenCL)
Latest Nvidia CUDA SDK / laptop developper driver
EDIT: I have tried it on my home computer and it works fine in all cases, which makes me believe that some limitation of OpenCL 1.0 is causing my problem and it was fixed in 1.1. My home system:
custom built core i7 w/ 4GB DDR3
Windows 7 x86 EN
AMD radeon HD 5770 (OpenCL 1.1)
Latest ATI Stream SDK / driver