I have a great 1 dimensional data array on the GPU with 1024*512elements from datatype float. This data array is a picture (width = 1024 pixel x high =512pixel)
The second array is a 1 dimensional array with 1024 elements from datatype float. This array contains 1024 coefficients. Now I want to multiply the first row with the coefficent array.
calculate the first row:
data_array[0] = (data_array[1] - data_array[0]) * coefficent_array[0] → 0. thread from block 0
data_array[1] = (data_array[2] - data_array[1]) * coefficent_array[1] → 1. thread from block 0
…
data_array[511] = (data_array[512] - data_array[511]) * coefficent_array[511] → 511. thread from block 0
data_array[512] = (data_array[513] - data_array[512]) * coefficent_array[512] → 0. thread from block 1
data_array[513] = (data_array[514] - data_array[513]) * coefficent_array[513] → 1. thread from block 1
…
data_array[1022] = (data_array[1023] - data_array[1022]) * coefficent_array[1022] → 510. thread from block 1
data_array[1023] = data_array[1023] → 511. thread from block 1 !!!
Here is my code:
dim3 dimGrid;
dim3 dimBlock;
dimBlock.x = 512;
dimBlock.y = 1;
dimGrid.x = 2;
dimGrid.y = 512;
kernel_linear_interpolation<<<dimGrid, dimBlock>>>(Data_In, Coefficient_Array, Number_of_Coefficients, Number_of_Rows);
global void kernel_linear_interpolation(float *Data_In, float *Coefficient_Array, int Number_of_Coefficients, int Number_of_Rows)
{
int tidx = threadIdx.x + blockIdx.x*blockDim.x;
int tidy = threadIdx.y + blockIdx.y*blockDim.y;
if( tidx >= Number_of_Coefficients - 1 ) // data_array[1023] = data_array[1023] --> 511. thread from block 1 Is this instruction correct ???
{
return;
}
if( tidy >= Number_of_Rows )
{
return;
}
Data_In[tidy * Number_of_Coefficients + tidx] = (Data_In[tidx + 1] - Data_In[tidx]) *Coefficient_Array[tidx]; // <b>I think here is my error </b>
__syncthreads();
}