problem indexing in simpleGL

Hi,

I’m having some trouble indexing the data vbo in the simpleGL project, but maybe this is a more general problem.

In the simpleGL kernel, x and y are initialized to be this:

unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;
unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;

When I index into the data vbo to access the x,y,z of a given vertex, it looks like this:

unsigned int i = (y*width+x)*3;
data[i] = u; // x coord
data[i+1] = w; // y coord
data[i+2] = v; // z coord
where u v are the u v coordinates and w is set to 0.0 in my example (i turned off the frequency stuff).

My problem is this. If I want to access the column to the left of the current column, I say

unsigned int i_left = ((y-1)*width+x)*3;

That is working. For instance, if I check for (y = width - 2) and then set data[i_left+1] = 0.5, the column that is two columns from the end will be up above the rest. If I then say:

data[i+1] = 0.5
instead of messing with i_left+1, it will only elevate the next to last column, leaving all of the others in their original position.

However, when I do
unsigned int i_right = ((y+1)*width+x)*3;
and then set [i_right+1] = 0.5, nothing happens.

I am getting similar results when I try to change x:

i_down = (y*width+x-1)3 // works properly
i = (y
width+x)3 // works properly
i_up = (y
width+x+1)*3 //does not work

Any idea why y-1 or x-1 would work but y+1 and x+1 do not? I’m not getting array exceptions or out of bound indexing. I’m just not getting any dots to move.

I’m running WinXP on a GeForce7300GT card, and I’m running this project in emulation mode.

thanks,
skiz

hmm … it looks like __syncthreads fixed the one instance where i was having this problem, but in another case where i’m trying to do this indexing, i already had __syncthreads and it still seems to not be working. very strange.

I think what you’re missing here is that the kernel function is effectively run in parallel for every thread in the block. If different threads are writing to the same location in global memory, the final results will be undefined.

In emulation mode each thread runs serially as a CPU thread. so you might be to get away with it in some cases. This is why it’s important to run your code on actual hardware when developing.