What is Happening in the Following copy?

Hi everybody,

I am using the following scheme to perform bit reversal (This is how I am making it work on the CPU)

for (.... x ....)

{

  reverse[x] = bit_reverse(x, bits_required_for_storage);

}

for (.... x ....)

{

  if ( x < reverse[x] ) 

    swap( data[x],  data[reverse[x]] )

}

The bit_reverse() function is irrelevant to discuss here, but I have it running on the GPU and after execution, it is able to give a correct sequence for me to swap with.

I am having problems in the swapping function.

Following is my code for the swap:

__kernel void swap(__global double *data, const int sizeX, const int sizeY, __global int *reverse)

{

  int idX = get_global_id(0);

  int idY = get_global_id(1);

  int BASE = idY * sizeX;

if (idX < reverse[idX])

    swap(data[BASE+idX], data[BASE+reverse[idX])

}

This, I ran with (which is fine for smaller sizes, which I am dealing with at the moment anyway):

globalSize[0] = size of X, globalSize[1] = size of Y, globalSize[2] = 1;

localSize[0] = size of X, localSize[1] = size of Y, localSize[2] = 1;

Having problems with this, I also tried:

globalSize[0] = 1, globalSize[1] = size of Y;

localSize[0] = 1, localSize[1] = size of Y;

__kernel void swap(__global double *data, const int sizeX, const int sizeY, __global int *reverse)

{

  int idX = get_global_id(0);

  int idY = get_global_id(1);

  int BASE = idY * sizeX;

__private int x;

  for (.... x ....)

  {

    if (x < reverse[x])

      swap(data[BASE+x], data[BASE+reverse[x])      

  }

}

The right sequence I should be getting for data:

5 6 7 8

1 2 3 4

with reverse:

0 2 1 3

should be:

5 7 6 8

1 3 2 4

But instead, I am getting values like:

11046 9090 9090 9130

28208 27020 27020 27044

Can anybody guide me why this is happening?

Okay I have sorted it. It was a host side problem. It seems to me that OpenCL does not “clear” memory locations on the GPU. The values which I was getting were “retained” from any previous run. I used enqueueWriteBuffer before the call to NDrangeKernel and it is solved now.

If it’s an issue of retained values, it’s a know GPU issue. I generally make sure to always memset the memory to 0 on debug code (I don’t think that OpenCL has a memset function so you’ll need to write a simple kernel).

You should also never trust your memory to me set to 0 on any platform. It’s highly platform and compiler dependent. On the CPU, the system takes advantage of the virtual memory controller to achieve this, but it’s not guarantied to work (for example, in debug mode on visual studio or with several memory profilers, memory is set to some magic number to detect access violations).