simple kernel, stupid mistake?

Hope you guys can help a newcomer. I’m using Vista, GeForce 8600M GT, and Visual Studio 2008 to try a simple CUDA program. However, I’m getting different return results from the following code fragment that uses the same input array arranged as a matrix:

void computeSimple(float* reference, float* idata, const unsigned int rows, const unsigned int columns)
for(int r = 0; r < rows; r++)
for(int c = 0; c < columns; c++)
reference[r * columns + c] = idata[r * columns + c] * c;

global void simpleKernel(float* g_oelevdata, float* g_ielevdata, const unsigned int num_rows, const unsigned int num_cols)
int r = blockIdx.y * blockDim.y + threadIdx.y;
int c = blockIdx.x * blockDim.x + threadIdx.x;
g_oelevdata[r * num_cols + c] = g_ielevdata[r * num_cols + c] * c;

The idata and g_ielevdata is the same data setting the reference and g_oelevdata arrays with the result. What am I doing wrong?


Got a buddy to help. The problem is in addressing the thread as the index. This example is a single linear array, but it is of a matrix of data with rows x columns. When you use the threadIdx, blockIdx, and blockDim to get to each point, you have to be sure you set up the grid and thread blocks correctly. I thought you could just throw anything in there as a combination, but that’s not the case. Since I was using the ‘c’ variable, it does not equate to the same index between the reference and kernel function.

Back to the books!