Looping problem


I am a bit new to OpenCL and I have a really strange problem with a “For” loop. The final code is intended to be used to simulate Bingo games and determine frequencies of a set of bingo patterns. The first step is to generate random bingo card. I know random is a bit of an issue… but I have the CPU pre-generate different seeds for each threads so that should not be a problem. My problem is when I try to pick random balls to fill up the bingo card, my “for” loop only works if I do no more than 5 loops. To make it clearer, this works:

for (i = 0; i < 5; i++)


    iBingoPos[i] = (int)(  DrawReal(iSeed1, iSeed2) * 15.0  );


iBingoPos[5] = (int)(  DrawReal(iSeed1, iSeed2) * 15.0  );

but this does’nt

for (i = 0; i < 6; i++)


    iBingoPos[i] = (int)(  DrawReal(iSeed1, iSeed2) * 15.0  );


As a test, I am copying the data from iBingoPos to a buffer to see it on the VS2008 debug, the 2nd code, which should do exactly the same, ends up with every element of the array having a value of 0 (which is the initial value of every element of the buffer).

Here is the complete code except for the RNG itself (company proprietary information, but it simply takes 2 seeds, figure out a random float value between 0 and 1, modifying the seeds value in the process):

void GenerateBingoCard(int* iBingoPos, int* iSeed1, int* iSeed2);

float DrawReal(int *iSeed1, int *iSeed2)


[…] returns a random float between 0 and 1


__kernel void BingoSimulations(__global const int* iSimulationParameters, __global const int* iSeeds, __global const int* iFixedPatterns,

                           __global const int* iPatternCandidates, __global const int* WorkPatterns, __global int* iWorkArea)


int iThreadID = get_global_id(0);

int iTotalNbThreads = get_global_size(0);

int iNbsimul = iSimulationParameters[0];

int iNbFixedPatterns = iSimulationParameters[1];

int iNbCandidates = iSimulationParameters[2];

int iNbWorkPatterns = iSimulationParameters[3];

int iEndingPattern = iSimulationParameters[4];

int iSeed1 = iSeeds[iThreadID];

int iSeed2 = iSeeds[iThreadID + iTotalNbThreads];

int iBingoPos[75];

GenerateBingoCard(iBingoPos, &iSeed1, &iSeed2);

// for testing purposes … transfers the bingo card of thread 0 in the output buffer

if (iThreadID == 0)


    for (int i = 0; i < 75; i++)

        iWorkArea[i] = iBingoPos[i];               //(int)(  DrawReal(&iSeed1, &iSeed2) * 15  );



void GenerateBingoCard(int* iBingoPos, int* iSeed1, int* iSeed2)


int i, x, y, iPickedIndex, iTmp;

for (i = 0; i < 75; i++)

    iBingoPos[i] = -1;

int iValueList[15];

for (i = 0; i < 15; i++)

    iValueList[i] = i;

for (i = 0; i < 25; i++)


    iBingoPos[i] = (int)(  DrawReal(iSeed1, iSeed2) * 15.0  );



On this last for, if I have it stop at 5, the content copied in the output buffer will show as expected the index 0 to 4 being randoms between 0 and 15 and the rest up to 74 being -1. If I make it do more than those 5 loops, it end up being all 0’s. I can do

iBingoPos[0] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );

iBingoPos[1] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );

iBingoPos[2] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );

iBingoPos[3] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );

iBingoPos[4] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );

iBingoPos[5] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );

iBingoPos[6] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );

iBingoPos[7] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );

iBingoPos[8] = (int)( DrawReal(iSeed1, iSeed2) * 15.0 );


and it works fine, which prooves me the problem is not in the random function.

If I try to simply assign a value in the loop as in

for (i = 0; i < 25; i++)


    iBingoPos[i] = 12;


it works fine … which seems to indicate that the “for” works fine too…

The random function does’nt do anything fancy, it creates 2 local variables (int and float) and only does a bunch of + - * /.

Any ideas?

System spec:

Dell M4500 laptop (core i7, 4GB DDR3)

Nvidia Quadro 1800M (Open CL 1.0)

Windows XP pro EN

Visual C# 2008 (using Cloo for OpenCL)

Latest Nvidia CUDA SDK / laptop developper driver

EDIT: I have tried it on my home computer and it works fine in all cases, which makes me believe that some limitation of OpenCL 1.0 is causing my problem and it was fixed in 1.1. My home system:

custom built core i7 w/ 4GB DDR3

Windows 7 x86 EN

AMD radeon HD 5770 (OpenCL 1.1)

Latest ATI Stream SDK / driver

Hey Thunder,

I experience the same problem:

You said, your solution was that

I cannot find any AddBarrier-method. What do you mean by this method?

Would be nice to hear from you!

I`ve done much more OpenCL since, this specific problem was solved using barriers. The thing is that whenever your OpenCL code writes to memory and then reads this memory to use it again, other than private variable, there are no guarantee that the memory write has happened yet so you can end up using the old value. Example

__global int x = 13;
x = 12;
int y = x + 0;

At the end of this, y might be either 12 or 13, depending on the order the memory controller choosed. Whenever you have a scenario where memory write happends and future reads need the updated value, use a barrier:


depending on if the memory that needs to ensure updating if the local or global memory. Last example:

__global int x = 13;
x = 12;
int y = x + 0;

that would lead to y = 12 every time, but slower execution time because the code has to stop execution until the memory controller has updated everything.

Hey Thunder,

thanks for the reply.

However, in the first post, I thought you talked about a non-working finish()-function for commandQueues…Your problem didn’t relate to that?