# Linear Interpolation

I have a great 1 dimensional data array on the GPU with 1024*512elements from datatype float. This data array is a picture (width = 1024 pixel x high =512pixel)

The second array is a 1 dimensional array with 1024 elements from datatype float. This array contains 1024 coefficients. Now I want to multiply the first row with the coefficent array.

calculate the first row:

data_array[0] = (data_array[1] - data_array[0]) * coefficent_array[0] --> 0. thread from block 0

data_array[1] = (data_array[2] - data_array[1]) * coefficent_array[1] --> 1. thread from block 0

data_array[511] = (data_array[512] - data_array[511]) * coefficent_array[511] --> 511. thread from block 0

data_array[512] = (data_array[513] - data_array[512]) * coefficent_array[512] --> 0. thread from block 1

data_array[513] = (data_array[514] - data_array[513]) * coefficent_array[513] --> 1. thread from block 1

data_array[1022] = (data_array[1023] - data_array[1022]) * coefficent_array[1022] --> 510. thread from block 1

data_array[1023] = data_array[1023] --> 511. thread from block 1 !!!

Here is my code:

``````dim3 dimGrid;

dim3 dimBlock;

dimBlock.x = 512;

dimBlock.y = 1;

dimGrid.x = 2;

dimGrid.y = 512;
``````

kernel_linear_interpolation<<<dimGrid, dimBlock>>>(Data_In, Coefficient_Array, Number_of_Coefficients, Number_of_Rows);

global void kernel_linear_interpolation(float *Data_In, float *Coefficient_Array, int Number_of_Coefficients, int Number_of_Rows)

{

``````int tidx = threadIdx.x + blockIdx.x*blockDim.x;

int tidy = threadIdx.y + blockIdx.y*blockDim.y;

if( tidx >= Number_of_Coefficients - 1 )    // data_array[1023] = data_array[1023]   --> 511. thread from block 1  Is this instruction correct ???

{

return;

}

if( tidy >= Number_of_Rows )

{

return;

}

Data_In[tidy * Number_of_Coefficients + tidx] = (Data_In[tidx + 1] - Data_In[tidx]) *Coefficient_Array[tidx];	  // <b>I think here is my error </b>

``````

}

Can you be more specific as to the nature of the problem you’re seeing?

One problem is that your input and output arrays are the same, so some threads may attempt to update the values while other threads are using them. Use separate input and output arrays to avoid this problem.

Another potential problem is if you want interpolation, you should probably be doing dataOut = dataIn + coef*(dataIn[x+1] - dataIn), instead of just taking the delta, which will give you a sort of gradient instead.

Another potential problem is that you are not using tidy*Number_of_Coefficients on the input (right hand side of =), which means that all rows will get the same value, which is probably not what you want.

The final __syncthreads() at the end is unnecessary, but harmless.

Perhaps try something like this:

``````__global__ void kernel_linear_interpolation(float *Data_Out, float *Data_In, float *Coefficient_Array, int Number_of_Coefficients, int Number_of_Rows) {

int tidx = threadIdx.x + blockIdx.x*blockDim.x;

int tidy = threadIdx.y + blockIdx.y*blockDim.y;

if( tidy >= Number_of_Rows ) {

return;

}

int w = Number_of_Coefficients;

if( tidx >= w) {

return;

}

else if (tidx == w - 1) {

Data_Out[tidy * w + tidx] = Data_In[tidy * w + tidx];

}

else {

Data_Out[tidy * w + tidx] = Data_In[tidy * w + tidx] + (Data_In[tidy * w + tidx + 1] - Data_In[tidy * w + tidx]) *Coefficient_Array[tidx];

}

}
``````