Hi,

I have an array VEC in global memory and the following operation needs to be performed:

VEC[i] = VEC[i-1] + VEC[i+1] for i = 1 to N-2

We can write 2 kernels to perform the above operation:

**global** void kernel1 ( float* temp, float *VEC, int N)

{

if ( TID > 1 && TID < N-1 )

temp[ TID ] = VEC[TID-1] + VEC[TID + 1];

}

**global** void kernel2(float *temp, float* VEC, int N)

{

if ( TID > 1 && TID < N-1 )

VEC[TID] = temp[TID];

}

Calling kernel1 and then kernel2 consecutively will yield the desired result. But its more expensive as two kernels are called.

I would just like to use one kernel like this:

**global** void kernel3( float* VEC, int N)

{

// Option 1

// This wont work and might give different results each time the kernel is called

// Some threads might modify the values at neighboring threads concurrently

VEC[TID] = VEC[TID-1] + VEC[TID+1];

// Option 2

// Try copying to register. Works sometimes

float temp = VEC[TID-1] + VEC[TID+1];

VEC[TID] = temp;

}

Is there any other way that will not give rise to thread conflicts, such as in kernel3 ? Any ideas will be appreciated.

-DC